Chapter 3. Components of Hadoop
This chapter covers
- Managing files in HDFS
- Analyzing components of the MapReduce framework
- Reading and writing input and output data
In the last chapter we looked at setting up and installing Hadoop. We covered what the different nodes do and how to configure them to work with each other. Now that you have Hadoop running, let’s look at the Hadoop framework from a programmer’s perspective. If the previous chapter is like teaching you how to connect your turntable, your mixer, your amplifier, and your speakers together, then this chapter is about the techniques of mixing music.
We first cover HDFS, where you’ll store data that your Hadoop applications will process. Next we explain the MapReduce framework in more detail. In chapter 1 we’ve already seen a MapReduce program, but we discussed the logic only at the conceptual level. In this chapter we get to know the Java classes and methods, as well as the underlying processing steps. We also learn how to read and write using different data formats.