Chapter 6. Speeding up map and reduce with advanced parallelization

This chapter covers

Advanced parallelization with map and starmap
Writing parallel reduce and map reduce patterns
Accumulation and combination functions

We ended chapter 5 with a paradoxical situation: using a parallel method and more compute resources was slower than a linear approach with fewer compute resources. Intuitively, we know this is wrong. If we’re using more resources, we should at the very least be as fast as our low-resource effort—hopefully we’re faster. We never want to be slower.

In this chapter, we’ll take a look at how to get the most out of parallelization in two ways:

By optimizing our use of parallel map
By using a parallel reduce

Parallel map, which I introduced in section 2.2, is a great technique for transforming a large amount of data quickly. However, we did gloss over some nuances when we were learning the basics. We’ll dig into those nuances in this chapter. Parallel reduce is parallelization that occurs at the reduce step of our map and reduce pattern. That is, we’ve already called map, and now we’re ready to accumulate the results of all those transformations. With parallel reduce, we use parallelization in the accumulation process instead of the transformation process.

6.1. Getting the most out of parallel map

Back in chapter 2, when we introduced parallel map, we covered a few of its shortfalls:

Chapter 6. Speeding up map and reduce with advanced parallelization

This chapter covers

6.1. Getting the most out of parallel map

6.2. Solving the parallel map and reduce paradox

Summary

Chapter 6. Speeding up map and reduce with advanced parallelization

This chapter covers

6.1. Getting the most out of parallel map

6.2. Solving the parallel map and reduce paradox

Summary

Unable to load book!