Chapter 2. Accelerating large dataset work: Map and parallel computing

This chapter covers

Using map to transform lots of data
Using parallel programming to transform lots of data
Scraping data from the web in parallel with map

In this chapter, we’ll look at map and how to use it for parallel programming, and we’ll apply those concepts to complete two web scraping exercises. With map, we’ll focus on three primary capabilities:

We can use it to replace for loops.
We can use it to transform data.
Map evaluates only when necessary, not when called.

These core ideas about map are also why it’s so useful for us in parallel programming. In parallel programming, we’re using multiple processing units to do partial work on a task and combining that work later. Transforming lots of data from one type to another is an easy task to break into pieces, and the instructions for doing so are generally easy to transfer. Making code parallel with map can be as easy as adding four lines of code to a program.

2.1. An introduction to map

In chapter 1, we talked a little bit about map, which is a function for transforming sequences of data. Specifically, we looked at the example of applying the mathematical function n+7 to a list of integers: [–1,0,1,2]. And we looked at the graphic in figure 2.1, which shows a series of numbers being mapped to their outputs.

Chapter 2. Accelerating large dataset work: Map and parallel computing

This chapter covers

2.1. An introduction to map

2.2. Parallel processing

2.3. Putting it all together: Scraping a Wikipedia network

2.4. Exercises

Summary

Chapter 2. Accelerating large dataset work: Map and parallel computing

This chapter covers

2.1. An introduction to map

2.2. Parallel processing

2.3. Putting it all together: Scraping a Wikipedia network

2.4. Exercises

Summary

Unable to load book!