This chapter covers
This chapter is not only about celebrating laziness. It also teaches, through examples and experiments, the fundamental differences between building a data application the traditional way and building one with Spark.
There are at least two kinds of laziness: sleeping under the trees when you’ve committed to doing something else, and thinking ahead in order to do your job in the smartest possible way. Although, at this precise moment, my mind is thinking of lying in the shade of a tree, largely inspired by Asterix in Corsica , in this chapter I will show how Spark makes your life easier by optimizing its workload. You will learn about the essential roles of transformations (each step of the data process) and actions (the trigger to get the work done).
You will work on a real dataset from the US National Center for Health Statistics. The application is designed to illustrate the reasoning that Spark goes through when it processes data. The chapter focuses on only one application, but it contains three execution modes, which correspond to three experiments that you will run to get a better sense of Spark’s “way of thinking.”