about this book
When I started this project, which became the book you are reading, Spark in Action, second edition, my goals were to
- Help the Java community use Apache Spark, demonstrating that you do not need to learn Scala or Python.
- Explain the key concepts behind Apache Spark, (Big) Data engineering, data science, without knowing anything else than a relational database and some SQL.
- Evangelize that Spark is an operating system designed for distributed computing and analytics.
I believe in teaching anything computer science with a high dose of examples. The examples in this book are an essential part of the learning process. I designed them to be as close as possible to real-life professional situations. The datasets used are coming from real-life situations with their quality flaws; they are not the ideal textbook datasets that “always work.” That’s why, when combining both those examples and datasets, you will work and learn in a more pragmatic way than a sterilized way. I call those examples “labs,” with the hope that you will find them inspirational and that you will want to experiment with them.
Illustrations are everywhere. Thanks to the well-known saying: A picture is worth a thousand words, I saved you from reading an extra 183,000 words.