table-of-contents

Table of Contents

Brief Table of Contents

Table of Contents

Acknowledgments

About this Book

About the Authors

About the Cover

Chapter 1. Introduction to Apache Spark

1.1. What is Spark?

1.1.1. The Spark revolution

1.1.2. MapReduce’s shortcomings

1.1.3. What Spark brings to the table

1.2. Spark components

1.2.1. Spark Core

1.2.2. Spark SQL

1.2.3. Spark Streaming

1.2.4. Spark MLlib

1.2.5. Spark GraphX

1.3. Spark program flow

1.4. Spark ecosystem

1.5. Setting up the spark-in-action VM

1.5.1. Downloading and starting the virtual machine

1.5.2. Stopping the virtual machine

Chapter 2. Spark fundamentals

2.1. Using the spark-in-action VM

2.1.1. Cloning the Spark in Action GitHub repository

2.1.2. Finding Java

2.1.3. Using the VM’s Hadoop installation

2.1.4. Examining the VM’s Spark installation

2.2. Using Spark shell and writing your first Spark program

2.2.1. Starting the Spark shell

2.2.2. The first Spark code example

2.2.3. The notion of a resilient distributed dataset

@font-face { font-family: 'livebook'; src:url('https://d19npu3b8zepp3.cloudfront.net/assets/fonts/livebook.eot?1.9.0'); src:url('https://d19npu3b8zepp3.cloudfront.net/assets/fonts/livebook.eot?1.9.0') format('embedded-opentype'), url('https://d19npu3b8zepp3.cloudfront.net/assets/fonts/livebook.woff?1.9.0') format('woff'), url('https://d19npu3b8zepp3.cloudfront.net/assets/fonts/livebook.ttf?1.9.0') format('truetype'), url('https://d19npu3b8zepp3.cloudfront.net/assets/fonts/livebook.svg?1.9.0') format('svg'); font-weight: normal; font-style: normal; }