Front matter
preface
acknowledgments
about this book
about the author
about the cover illustration
1 Introduction
1.1 What is PySpark?
Taking it from the start: What is Spark?
PySpark = Spark + Python
Why PySpark?
1.2 Your very own factory: How PySpark works
Some physical planning with the cluster manager
A factory made efficient through a lazy leader
1.3 What will you learn in this book?
1.4 What do I need to get started?
Part 1. Get acquainted: First steps in PySpark
2 Your first data program in PySpark
2.1 Setting up the PySpark shell
The SparkSession entry point
Configuring how chatty spark is: The log level
2.2 Mapping our program
2.3 Ingest and explore: Setting the stage for data transformation
From structure to content: Exploring our data frame with show()