Chapter 12. Running on YARN and Mesos

 

This chapter covers

  • YARN architecture
  • YARN resource scheduling
  • Configuring and running Spark on YARN
  • Mesos architecture
  • Mesos resource scheduling
  • Configuring and running Spark on Mesos
  • Running Spark from Docker

We examined a Spark standalone cluster in the previous chapter. Now it’s time to tackle YARN and Mesos, two other cluster managers supported by Spark. They’re both widely used (with YARN still more widespread) and offer similar functionalities, but each has its own specific strengths and weaknesses. Mesos is the only cluster manager supporting fine-grained resource scheduling mode; you can also use Mesos to run Spark tasks in Docker images. In fact, the Spark project was originally started to demonstrate the usefulness of Mesos,[1] which illustrates Mesos’s importance. YARN lets you access Kerberos-secured HDFS (Hadoop distributed filesystem restricted to users authenticated using the Kerberos authentication protocol) from your Spark applications.

1See “Mesos: A Platform for Fine-Grained Resource Sharing in the Data Center,” by Benjamin Hindman et al., http://mesos.berkeley.edu/mesos_tech_report.pdf.

In this chapter, we’ll describe the architectures, installation and configuration options, and resource scheduling mechanisms for Mesos and YARN. We’ll also highlight the differences between them and how to avoid common pitfalls. In short, this chapter will help you decide which platform better suits your needs. We’ll start with YARN.

12.1. Running Spark on YARN

 
 
 

12.2. Running Spark on Mesos

 
 
 

12.3. Summary

 
 
 
sitemap

Unable to load book!

The book could not be loaded.

(try again in a couple of minutes)

manning.com homepage
test yourself with a liveTest