Part 2. Beyond the basics

 

Now that you’re familiar with Airflow’s basics and able to build some of your own data pipelines, you’re ready to learn some more advanced techniques that allow you to build more complex cases involving external systems, custom components, and more.

In chapter 6, we’ll examine how you can trigger pipelines in ways that don’t involve fixed schedules. This allows you to trigger pipelines in response to certain events, such as new files coming in or a call from an HTTP service.

Chapter 7 will demonstrate how to use Airflow’s built-in functionality to run tasks on external systems. This is an extremely powerful feature of Airflow that allows you to build pipelines that coordinate data flows across many different systems, such as databases, computational frameworks such as Apache Spark, and storage systems.

Next, chapter 8 will show you how you can build custom components for Airflow, allowing you to execute tasks on systems not supported by Airflow’s built-in functionality. This functionality can also be used to build components that can easily be reused across your pipelines to support common workflows.

To help increase the robustness of your pipelines, chapter 9 elaborates on different strategies you can use to test your data pipelines and custom components. This has been a commonly recurring topic in Airflow meet-ups, so we’ll spend some time exploring it.