Chapter 11. Running on a Spark standalone cluster

This chapter covers

Components of Spark standalone cluster
Spinning up the cluster
Spark cluster Web UI
Running applications
Spark History Server
Running on Amazon EC2

After describing common aspects of running Spark and examining Spark local modes in chapter 10, now we get to the first “real” Spark cluster type. The Spark standalone cluster is a Spark-specific cluster: it was built specifically for Spark, and it can’t execute any other type of application. It’s relatively simple and efficient and comes with Spark out of the box, so you can use it even if you don’t have a YARN or Mesos installation.

In this chapter, we’ll explain the runtime components of a standalone cluster and how to configure and control those components. A Spark standalone cluster comes with its own web UI, and we’ll show you how to use it to monitor cluster processes and running applications. A useful component for this is Spark’s History Server; we’ll also show you how to use it and explain why you should.

Spark provides scripts for quickly spinning up a standalone cluster on Amazon EC2. (If you aren’t acquainted with it, Amazon EC2 is Amazon’s cloud service, offering virtual servers for rent.) We’ll walk you through how to do that. Let’s get started.

11.1. Spark standalone cluster components

Chapter 11. Running on a Spark standalone cluster

This chapter covers

11.1. Spark standalone cluster components

11.2. Starting the standalone cluster

11.3. Standalone cluster web UI

11.4. Running applications in a standalone cluster

11.5. Spark History Server and event logging

11.6. Running on Amazon EC2

11.7. Summary

Chapter 11. Running on a Spark standalone cluster

This chapter covers

11.1. Spark standalone cluster components

11.2. Starting the standalone cluster

11.3. Standalone cluster web UI

11.4. Running applications in a standalone cluster

11.5. Spark History Server and event logging

11.6. Running on Amazon EC2

11.7. Summary

Unable to load book!