Chapter 1. Introduction to Apache Spark
Figure 1.1. A word-count program demonstrates Spark’s conciseness and simplicity. The program is shown implemented in Hadoop’s MapReduce framework on the left and as a Spark Scala program on the right.
Figure 1.2. Main Spark components and various runtime interactions and storage options
Figure 1.3. Storing a 300 MB log file in a three-node Hadoop cluster
Figure 1.4. Loading a text file from HDFS
Figure 1.5. Filtering the collection to contain only lines containing the OutOfMemoryError string
Figure 1.6. Basic infrastructure, interface, analytic, and management tools in the Hadoop ecosystem, with some of the functionalities that Spark incorporates or makes obsolete
Chapter 3. Writing Spark applications
Figure 3.1. Adding the Spark in Action Maven Remote Archetype Catalog to your Eclipse Preferences
Figure 3.2. Choosing the Maven Archetype that you want to use as the new project’s template. Select scala-archetype-sparkinaction.
Figure 3.3. Creating a Maven project: specifying project parameters
Figure 3.4. The newly generated project in Eclipse’s Package Explorer window
Figure 3.5. Project’s libraries dependency hierarchy (in pom.xml)
Figure 3.6. Specifying the run configuration for uberjar packaging
Chapter 4. The Spark API in depth