This chapter covers
- Advanced parallelization with map and starmap
- Writing parallel reduce and map reduce patterns
- Accumulation and combination functions
We ended chapter 5 with a paradoxical situation: using a parallel method and more compute resources was slower than a linear approach with fewer compute resources. Intuitively, we know this is wrong. If we’re using more resources, we should at the very least be as fast as our low-resource effort—hopefully we’re faster. We never want to be slower.
In this chapter, we’ll take a look at how to get the most out of parallelization in two ways:
- By optimizing our use of parallel map
- By using a parallel reduce
Parallel map, which I introduced in section 2.2, is a great technique for transforming a large amount of data quickly. However, we did gloss over some nuances when we were learning the basics. We’ll dig into those nuances in this chapter. Parallel reduce is parallelization that occurs at the reduce step of our map and reduce pattern. That is, we’ve already called map, and now we’re ready to accumulate the results of all those transformations. With parallel reduce, we use parallelization in the accumulation process instead of the transformation process.
Back in chapter 2, when we introduced parallel map, we covered a few of its shortfalls: