Spark in Action, Second Edition: With examples in Java, Python, and Scala cover
welcome to this free extract from
an online version of the Manning book.
to read more
or
about-this-book

about this book

 

When I started this project, which became the book you are reading, Spark in Action, second edition, my goals were to

  • Help the Java community use Apache Spark, demonstrating that you do not need to learn Scala or Python.
  • Explain the key concepts behind Apache Spark, (Big) Data engineering, data science, without knowing anything else than a relational database and some SQL.
  • Evangelize that Spark is an operating system designed for distributed computing and analytics.

I believe in teaching anything computer science with a high dose of examples. The examples in this book are an essential part of the learning process. I designed them to be as close as possible to real-life professional situations. The datasets used are coming from real-life situations with their quality flaws; they are not the ideal textbook datasets that “always work.” That’s why, when combining both those examples and datasets, you will work and learn in a more pragmatic way than a sterilized way. I call those examples “labs,” with the hope that you will find them inspirational and that you will want to experiment with them.

Illustrations are everywhere. Thanks to the well-known saying: A picture is worth a thousand words, I saved you from reading an extra 183,000 words.

1.1 Who should read this book

1.2 How this book is organized

1.3 About the code