chapter eighteen

18. Exploring deployment constraints: Understanding the ecosystem

This chapter covers

Learning key concepts behind deploying big data applications
Learning the roles of resource and cluster managers
Sharing data and files with Spark’s workers
Securing both network communication and disk I/O

In this last chapter of the book, you will explore the key concepts required to grasp the infrastructure constraints of deploying a big data application. This chapter explores the constraints of deployment, not the deployment process itself or installing Apache Spark in a production environment. That essential information is covered in chapters 5 and 6, as well as appendix K.

Apache Spark lives in an ecosystem, where it shares resources, data, security, and more with other applications and components. Spark lives in an open world, and this chapter also explores the constraint of being a good citizen in this world.

18.1 Managing resources with YARN, Mesos, and Kubernetes

18.1.1 The built-in standalone mode manages resources

18.1.2 YARN manages resources in a Hadoop environment

18.1.3 Mesos is a standalone resource manager

18.1.4 Kubernetes orchestrates containers

18.1.5 Choosing the right resource manager

18.2 Sharing files with Spark

18.2.1 Accessing the data contained in files

18.2.2 Sharing files through distributed filesystems

18.2.3 Accessing files on shared drives or file server

18.2.4 Using file-sharing services to distribute files

18.2.5 Other options for accessing files in Spark

18.2.6 Hybrid solution for sharing files with Spark

18.3 Making sure your Spark application is secure

18.3.1 Securing the network components of your infrastructure

18.3.2 Securing Spark’s disk usage

Summary