front matter

 

preface

This is the book I wish I had available to refer to over the past few years, while scaling out the big data platform of the Customer Growth and Analytics team in Azure. As our data science team grew and the insights generated by the team became more and more critical to the business, we had to ensure that our platform was robust.

The world of big data is relatively new, and the playbook is still being written. I believe our story is common: data teams start small with a handful of people, who first prove they can generate valuable insights. At this stage, a lot of work happens ad hoc, and there is no immediate need for big engineering investments. A data scientist can run a machine learning (ML) model on their machine, generate some predictions, and email the results.

Over time, the team grows and more workloads become mission critical. The same ML model now plugs into a system serving live traffic and needs to run on a daily basis with more than a hundred times the data it was originally prototyped with. At this point, solid engineering practices are critical; we need scale, reliability, automation, monitoring, etc.

This book contains several years of hard-learned lessons in data engineering. To name a few examples:

acknowledgments

about this book

about the author

about the cover illustration