Chapter 7. Cookbook
This chapter covers
- Passing custom parameters to tasks
- Retrieving task-specific information
- Creating multiple outputs
- Interfacing with relational databases
- Making output globally sorted
This book so far has covered the core techniques for making a MapReduce program. Hadoop is a big framework that supports many more functionalities than those core techniques. In this age of Bing and Google, you can look up specialized MapReduce techniques rather easily, and we don’t try to be an encyclopedic reference. In our own usage and from our discussion with other Hadoop users, we’ve found a number of techniques generally useful, techniques such as being able to take a standard relational database as input or output to a MapReduce job. We’ve collected some of our favorite “recipes” in this cookbook chapter.
In writing your Mapper and Reducer, you often want to make certain aspects configurable. For example, our joining program in chapter 5 is hardcoded to take the first data column as the join key. The program can be more generally applicable if the column for the join key can be specified by the user at run time. Hadoop itself uses a configuration object to store all the configuration properties for a job. You can use the same object to pass parameters to your Mapper and Reducer.