This chapter covers
In the previous chapters, you discovered what Apache Spark is and how to build simple applications, and, hopefully, understood key concepts including the dataframe and laziness. Chapters 5 and 6 are linked: you will build an application in this chapter and deploy it in chapter 6.
In this chapter, you will start from scratch by building an application. You built applications previously in this book, but they always needed to ingest data at the very beginning of the process. Your lab will generate data within and by Spark, avoiding the need to ingest data. Ingesting data in a cluster is a bit more complex than creating a self-generated dataset. The goal of this application is to approximate a value of π (pi).
- Local mode, which you are already familiar with through the examples in the previous chapters
- Cluster mode
- Interactive mode
Lab
Examples from this chapter are available in GitHub at https://github .com/jgperrin/net.jgp.books.spark.ch05.