Chapter 8. Tuning, debugging, and testing

 

This chapter covers

  • Measuring and tuning MapReduce execution times
  • Debugging your applications
  • Testing tips to improve the quality of your code

Imagine you’ve written a new piece of MapReduce code, and you’re executing it on your shiny new cluster. You’re surprised to learn that despite having a good-sized cluster, your job is running significantly longer than you expected. You’ve obviously hit a performance issue with your job, but how do you figure out where the problem lies?

This chapter starts out by reviewing common performance problems in Map-Reduce, such as the lack of data locality and running with too many mappers. This tuning section also examines some enhancements that you can make to your jobs to increase their efficiency by using binary comparators in the shuffle phase and using a compact data format to minimize parsing and data transfer times.

The second part of this chapter covers some tips that will help you debug your applications, including instructions on how to access YARN container startup scripts, and some suggestions on how to design your MapReduce jobs to aid future debugging efforts.

8.1. Measure, measure, measure

 
 
 

8.2. Tuning MapReduce

 
 

8.3. Debugging

 
 
 

8.4. Testing MapReduce jobs

 
 
 

8.5. Chapter summary

 
 
sitemap

Unable to load book!

The book could not be loaded.

(try again in a couple of minutes)

manning.com homepage
test yourself with a liveTest