This chapter covers
- Using map to transform lots of data
- Using parallel programming to transform lots of data
- Scraping data from the web in parallel with map
In this chapter, we’ll look at map and how to use it for parallel programming, and we’ll apply those concepts to complete two web scraping exercises. With map, we’ll focus on three primary capabilities:
- We can use it to replace for loops.
- We can use it to transform data.
- Map evaluates only when necessary, not when called.
These core ideas about map are also why it’s so useful for us in parallel programming. In parallel programming, we’re using multiple processing units to do partial work on a task and combining that work later. Transforming lots of data from one type to another is an easy task to break into pieces, and the instructions for doing so are generally easy to transfer. Making code parallel with map can be as easy as adding four lines of code to a program.
In chapter 1, we talked a little bit about map, which is a function for transforming sequences of data. Specifically, we looked at the example of applying the mathematical function n+7 to a list of integers: [–1,0,1,2]. And we looked at the graphic in figure 2.1, which shows a series of numbers being mapped to their outputs.