part three

Part 3. Spark ops

Using Spark isn’t just about writing and running Spark applications. It’s also about configuring Spark clusters and system resources to be used efficiently by applications. The necessary concepts and configuration options for running Spark applications on Spark standalone, Hadoop YARN, and Mesos clusters are explained in this part of the book.

Chapter 10 explores Spark runtime components, Spark cluster types, job and resource scheduling, configuring Spark, and the Spark web UI. These are concepts common to all cluster managers that Spark can run on: the Spark standalone cluster, YARN, and Mesos. The two local modes are also explained in chapter 10.

You’ll learn about the Spark standalone cluster in chapter 11: its components, how to start it and run applications on it, and how to use its web UI. Spark History Server, which keeps details about previously run jobs, is also discussed. You’ll also learn how to use Spark’s scripts to start up a Spark standalone cluster on Amazon EC2.

Chapter 12 goes through the specifics of setting up, configuring, and using YARN and Mesos clusters for running Spark applications.