chapter three

Chapter 3. Components of Hadoop

This chapter covers

Managing files in HDFS
Analyzing components of the MapReduce framework
Reading and writing input and output data

In the last chapter we looked at setting up and installing Hadoop. We covered what the different nodes do and how to configure them to work with each other. Now that you have Hadoop running, let’s look at the Hadoop framework from a programmer’s perspective. If the previous chapter is like teaching you how to connect your turntable, your mixer, your amplifier, and your speakers together, then this chapter is about the techniques of mixing music.

We first cover HDFS, where you’ll store data that your Hadoop applications will process. Next we explain the MapReduce framework in more detail. In chapter 1 we’ve already seen a MapReduce program, but we discussed the logic only at the conceptual level. In this chapter we get to know the Java classes and methods, as well as the underlying processing steps. We also learn how to read and write using different data formats.

Chapter 3. Components of Hadoop

This chapter covers

3.1. Working with files in HDFS

3.2. Anatomy of a MapReduce program

3.3. Reading and writing

3.4. Summary