Chapter 4. Processing large datasets with lazy workflows

 

This chapter covers

  • Writing lazy workflows for processing large datasets locally
  • Understanding the lazy behavior of map
  • Writing classes with generators for lazy simulations

In chapter 2 (section 2.1.2, to be exact), I introduced the idea that our beloved map function is lazy by default; that is, it only evaluates when the value is needed downstream. In this chapter, we’ll look at a few of the benefits of laziness, including how we can use laziness to process big data on our laptop. We’ll focus on the benefits of laziness in two contexts:

  1. File processing
  2. Simulations

With file processing, we’ll see that laziness allows us to process much more data than could fit in memory without laziness. With simulations, we’ll see how we can use laziness to run “infinite” simulations. Indeed, lazy functions allow us to work with an infinite amount of data just as easily as we could if we were working with a limited amount of data.

4.1. What is laziness?

Laziness, or lazy evaluation, is a strategy that programming languages use when deciding when to perform computations. Under lazy evaluation, the Python interpreter executes lazy Python code only when the program needs the results of that code.

4.2. Some lazy functions to know

4.3. Understanding iterators: The magic behind lazy Python

4.4. The poetry puzzle: Lazily processing a large dataset

4.5. Lazy simulations: Simulating fishing villages

4.6. Exercises

Summary

sitemap