Preface

 

Looking back at the last year and a half, I can’t help but wonder: how on Earth did I manage to survive this? These were the busiest 18 months of my life! Ever since Manning asked Marko and me to write a book about Spark, I have spent most of my free time on Apache Spark. And that made this period all the more interesting. I learned a lot, and I can honestly say it was worth it.

Spark is a super-hot topic these days. It was conceived in Berkeley, California, in 2009 by Matei Zaharia (initially as an attempt to prove the Mesos execution platform feasible) and was open sourced in 2010. In 2013, it was donated to the Apache Software Foundation, and it has been the target of lightning-fast development ever since. In 2015, Spark was one of the most active Apache projects and had more than 1,000 contributors. Today, it’s a part of all major Hadoop distributions and is used by many organizations, large and small, throughout the world in all kinds of applications.

The trouble with writing a book about a project such as Spark is that it develops very quickly. Since we began writing Spark in Action, we’ve seen six minor releases of Spark, with many new, important features that needed to be covered. The first major release (version 2.0) came out after we’d finished writing most of the book, and we had to delay publication to cover the new features that came with it.