chapter four

Chapter 4. Fundamentally lazy

 

This chapter covers

  • Understanding efficient laziness
  • Using Spark’s laziness to your benefit
  • Comparing the operations to build a data application the traditional way and the Spark way
  • Building great data-centric applications using Spark
  • Learning more about transformations and actions
  • Using Catalyst, Spark’s built-in optimizer
  • Introducing directed acyclic graphs

This chapter is not only about celebrating laziness. It also teaches, through examples and experiments, the fundamental differences between building a data application the traditional way and building one with Spark.

There are at least two kinds of laziness: sleeping under the trees when you committed to doing something else, and thinking ahead of time in order to do your job in the smartest possible way. Although, at this precise moment, my mind is thinking of lying in the shade of a tree, largely inspired by Asterix in Corsica, in this chapter I will show how Spark makes your life easier by optimizing its workload. You will learn about the essential roles of transformations (each step of the data process) and actions (the trigger to get the work done).