Chapter 1. Introducing Hadoop
Figure 1.1. A Hadoop cluster has many parallel machines that store and process large data sets. Client computers send jobs into this computer cloud and obtain results.
Chapter 2. Starting Hadoop
Figure 2.1. NameNode/DataNode interaction in HDFS. The NameNode keeps track of the file metadata—which files are in the system and how each file is broken down into blocks. The DataNodes provide backup store of the blocks and constantly report to the NameNode to keep the metadata current.
Figure 2.2. JobTracker and TaskTracker interaction. After a client calls the JobTracker to begin a data processing job, the JobTracker partitions the work and assigns different map and reduce tasks to each TaskTracker in the cluster.
Figure 2.3. Topology of a typical Hadoop cluster. It’s a master/slave architecture in which the NameNode and JobTracker are masters and the DataNodes and TaskTrackers are slaves.
Figure 2.4. A snapshot of the HDFS web interface. From this interface you can browse through the HDFS filesystem, determine the storage available on each individual node, and monitor the overall health of your cluster.
Figure 2.5. A snapshot of the MapReduce web interface. This tool allows you to monitor active MapReduce jobs and access the logs of each map and reduce task. The logs of previously submitted jobs are also available and are useful for debugging your programs.