front matter

 

preface

We’ve both been fortunate to be data engineers in interesting and challenging times. For better or worse, many companies and organizations are realizing that data plays a key role in managing and improving their operations. Recent developments in machine learning and AI have opened a slew of new opportunities to capitalize on. However, adopting data-centric processes is often difficult, as it generally requires coordinating jobs across many different heterogeneous systems and tying everything together in a nice, timely fashion for the next analysis or product deployment.

In 2014, engineers at Airbnb recognized the challenges of managing complex data workflows within the company. To address those challenges, they started developing Airflow: an open source solution that allowed them to write and schedule workflows and monitor workflow runs using the built-in web interface.

The success of the Airflow project quickly led to its adoption under the Apache Software Foundation, first as an incubator project in 2016 and later as a top-level project in 2019. As a result, many large companies now rely on Airflow for orchestrating numerous critical data processes.

acknowledgments

Bas Harenslak

Julian de Ruiter

about this book

Who should read this book

How this book is organized: A road map

About the code

LiveBook discussion forum

about the authors

about the cover illustration