Chapter 11. Running on a Spark standalone cluster
This chapter covers
- Components of Spark standalone cluster
- Spinning up the cluster
- Spark cluster Web UI
- Running applications
- Spark History Server
- Running on Amazon EC2
After describing common aspects of running Spark and examining Spark local modes in chapter 10, now we get to the first “real” Spark cluster type. The Spark standalone cluster is a Spark-specific cluster: it was built specifically for Spark, and it can’t execute any other type of application. It’s relatively simple and efficient and comes with Spark out of the box, so you can use it even if you don’t have a YARN or Mesos installation.
In this chapter, we’ll explain the runtime components of a standalone cluster and how to configure and control those components. A Spark standalone cluster comes with its own web UI, and we’ll show you how to use it to monitor cluster processes and running applications. A useful component for this is Spark’s History Server; we’ll also show you how to use it and explain why you should.
Spark provides scripts for quickly spinning up a standalone cluster on Amazon EC2. (If you aren’t acquainted with it, Amazon EC2 is Amazon’s cloud service, offering virtual servers for rent.) We’ll walk you through how to do that. Let’s get started.