contents

 

  

Front matter

preface

acknowledgments

about this book

about the author

about the cover illustration

  

  1 Introduction

  1.1  What is PySpark?

Taking it from the start: What is Spark?

PySpark = Spark + Python

Why PySpark?

  1.2  Your very own factory: How PySpark works

Some physical planning with the cluster manager

A factory made efficient through a lazy leader

  1.3  What will you learn in this book?

  1.4  What do I need to get started?

Part 1. Get acquainted: First steps in PySpark

  2 Your first data program in PySpark

  2.1  Setting up the PySpark shell

The SparkSession entry point

Configuring how chatty spark is: The log level

  2.2  Mapping our program

  2.3  Ingest and explore: Setting the stage for data transformation

From structure to content: Exploring our data frame with show()