Copyright
Brief Table of Contents
Table of Contents
Preface
Acknowledgments
About this Book
About the Authors
About the Cover
1. First steps
Chapter 1. Introduction to Apache Spark
1.1. What is Spark?
1.1.1. The Spark revolution
1.1.2. MapReduce’s shortcomings
1.1.3. What Spark brings to the table
1.2. Spark components
1.2.1. Spark Core
1.2.2. Spark SQL
1.2.3. Spark Streaming
1.2.4. Spark MLlib
1.2.5. Spark GraphX
1.3. Spark program flow
1.4. Spark ecosystem
1.5. Setting up the spark-in-action VM
1.5.1. Downloading and starting the virtual machine
1.5.2. Stopping the virtual machine
1.6. Summary
Chapter 2. Spark fundamentals
2.1. Using the spark-in-action VM
2.1.1. Cloning the Spark in Action GitHub repository
2.1.2. Finding Java
2.1.3. Using the VM’s Hadoop installation
2.1.4. Examining the VM’s Spark installation
2.2. Using Spark shell and writing your first Spark program
2.2.1. Starting the Spark shell
2.2.2. The first Spark code example
2.2.3. The notion of a resilient distributed dataset