List of Figures

 

Chapter 1. Introduction to Apache Spark

Figure 1.1. A word-count program demonstrates Spark’s conciseness and simplicity. The program is shown implemented in Hadoop’s MapReduce framework on the left and as a Spark Scala program on the right.

Figure 1.2. Main Spark components and various runtime interactions and storage options

Figure 1.3. Storing a 300 MB log file in a three-node Hadoop cluster

Figure 1.4. Loading a text file from HDFS

Figure 1.5. Filtering the collection to contain only lines containing the OutOfMemoryError string

Figure 1.6. Basic infrastructure, interface, analytic, and management tools in the Hadoop ecosystem, with some of the functionalities that Spark incorporates or makes obsolete

Chapter 3. Writing Spark applications

Figure 3.1. Adding the Spark in Action Maven Remote Archetype Catalog to your Eclipse Preferences

Figure 3.2. Choosing the Maven Archetype that you want to use as the new project’s template. Select scala-archetype-sparkinaction.

Figure 3.3. Creating a Maven project: specifying project parameters

Figure 3.4. The newly generated project in Eclipse’s Package Explorer window

Figure 3.5. Project’s libraries dependency hierarchy (in pom.xml)

Figure 3.6. Specifying the run configuration for uberjar packaging

Chapter 4. The Spark API in depth