Chapter 4. Writing basic MapReduce programs

This chapter covers

Patent data as an example data set to process with Hadoop
Skeleton of a MapReduce program
Basic MapReduce programs to count statistics
Hadoop’s Streaming API for writing MapReduce programs using scripting languages
Combiner to improve performance

The MapReduce programming model is unlike most programming models you may have learned. It’ll take some time and practice to gain familiarity. To help develop your proficiency, we go through many example programs in the next couple chapters. These examples will illustrate various MapReduce programming techniques. By applying MapReduce in multiple ways you’ll start to develop an intuition and a habit of “MapReduce thinking.” The examples will cover simple tasks to advanced uses. In one of the advanced applications we introduce the Bloom filter, a data structure not normally taught in the standard computer science curriculum. You’ll see that processing large data sets, whether you’re using Hadoop or not, often requires a rethinking of the underlying algorithms.

We assume you already have a basic grasp of Hadoop. You can set up Hadoop, and you have compiled and run an example program, such as word counting from chapter 1. Let’s use examples—from a real-world data set.

4.1. Getting the patent data set

4.2. Constructing the basic template of a MapReduce program

Chapter 4. Writing basic MapReduce programs

This chapter covers

4.1. Getting the patent data set

4.2. Constructing the basic template of a MapReduce program

4.3. Counting things

4.4. Adapting for Hadoop’s API changes

4.5. Streaming in Hadoop

4.6. Improving performance with combiners

4.7. Exercising what you’ve learned

4.8. Summary

4.9. Further resources

Chapter 4. Writing basic MapReduce programs

This chapter covers

4.1. Getting the patent data set

4.2. Constructing the basic template of a MapReduce program

4.3. Counting things

4.4. Adapting for Hadoop’s API changes

4.5. Streaming in Hadoop

4.6. Improving performance with combiners

4.7. Exercising what you’ve learned

4.8. Summary

4.9. Further resources

Unable to load book!