Chapter 3. Function pipelines for mapping complex transformations

 

This chapter covers

  • Using map to do complex data transformations
  • Chaining together small functions into pipelines
  • Applying these pipelines in parallel on large datasets

In the last chapter, we saw how you can use map to replace for loops and how using map makes parallel computing straightforward: a small modification to map, and Python will take care of the rest. But so far with map, we’ve been working with simple functions. Even in the Wikipedia scraping example from chapter 2, our hardest working function only pulled text off the internet. If we want to make parallel programming really useful, we’ll want to use map in more complex ways. This chapter introduces how to do complex things with map. Specifically, we’re going to introduce two new concepts:

  1. Helper functions
  2. Function chains (also known as pipelines)

We’ll tackle those topics by looking at two very different examples. In the first, we’ll decode the secret messages of a malicious group of hackers. In the second, we’ll help our company do demographic profiling on its social media followers. Ultimately, though, we’ll solve both of these problems the same way: by creating function chains out of small helper functions.

3.1. Helper functions and function chains

3.2. Unmasking hacker communications

3.2.1. Creating helper functions

3.2.2. Creating a pipeline

3.3. Twitter demographic projections

3.3.1. Tweet-level pipeline

3.3.2. User-level pipeline

3.3.3. Applying the pipeline

3.4. Exercises

3.4.1. Helper functions and function pipelines

3.4.2. Math teacher trick

3.4.3. Caesar’s cipher

Summary

sitemap