Chapter 11. Running on a Spark standalone cluster

 

This chapter covers

  • Components of Spark standalone cluster
  • Spinning up the cluster
  • Spark cluster Web UI
  • Running applications
  • Spark History Server
  • Running on Amazon EC2

After describing common aspects of running Spark and examining Spark local modes in chapter 10, now we get to the first “real” Spark cluster type. The Spark standalone cluster is a Spark-specific cluster: it was built specifically for Spark, and it can’t execute any other type of application. It’s relatively simple and efficient and comes with Spark out of the box, so you can use it even if you don’t have a YARN or Mesos installation.

In this chapter, we’ll explain the runtime components of a standalone cluster and how to configure and control those components. A Spark standalone cluster comes with its own web UI, and we’ll show you how to use it to monitor cluster processes and running applications. A useful component for this is Spark’s History Server; we’ll also show you how to use it and explain why you should.

Spark provides scripts for quickly spinning up a standalone cluster on Amazon EC2. (If you aren’t acquainted with it, Amazon EC2 is Amazon’s cloud service, offering virtual servers for rent.) We’ll walk you through how to do that. Let’s get started.

11.1. Spark standalone cluster components

11.2. Starting the standalone cluster

11.3. Standalone cluster web UI

11.4. Running applications in a standalone cluster

11.5. Spark History Server and event logging

11.6. Running on Amazon EC2

11.7. Summary

sitemap