about-this-book

about this book

When I started this project, which became the book you are reading, Spark in Action, second edition, my goals were to

Help the Java community use Apache Spark, demonstrating that you do not need to learn Scala or Python.
Explain the key concepts behind Apache Spark, (Big) Data engineering, data science, without knowing anything else than a relational database and some SQL.
Evangelize that Spark is an operating system designed for distributed computing and analytics.

I believe in teaching anything computer science with a high dose of examples. The examples in this book are an essential part of the learning process. I designed them to be as close as possible to real-life professional situations. The datasets used are coming from real-life situations with their quality flaws; they are not the ideal textbook datasets that “always work.” That’s why, when combining both those examples and datasets, you will work and learn in a more pragmatic way than a sterilized way. I call those examples “labs,” with the hope that you will find them inspirational and that you will want to experiment with them.

Illustrations are everywhere. Thanks to the well-known saying: A picture is worth a thousand words, I saved you from reading an extra 183,000 words.

about this book

When I started this project, which became the book you are reading, Spark in Action, second edition, my goals were to

1.1 Who should read this book

1.2 How this book is organized

1.3 About the code