2 Anatomy of an Airflow DAG

 

This chapter covers

  • Running Airflow on your own machine
  • Writing and running your first workflow
  • Examining the first view at the Airflow interface
  • Handling failed tasks in Airflow

In the previous chapter, we gained knowledge about what data pipelines are and how Airflow can help us to manage them better. In this chapter, we get started with Airflow and check out an example workflow that uses basic building blocks found in many workflows.

It helps to have some Python experience when starting with Airflow since workflows are defined in Python code. Generally, getting the basic structure of an Airflow workflow up and running is easy. Let’s dig into a use case of a rocket enthusiast to see how Airflow might help him.

2.1 Collecting data from numerous sources

Rockets are one of humanity’s engineering marvels, and every rocket launch attracts attention all around the world. In this chapter, we cover the life of a rocket enthusiast named John who tracks and follows every single rocket launch. The news about rocket launches is found in many news sources that John keeps track of, and, ideally, John would like to have all his rocket news aggregated in a single location. John recently picked up programming and would like to have some sort of automated way to collect information about all rocket launches and eventually some sort of personal insight into the latest rocket news. To start small, John decided to first collect images of rockets.

2.1.1 Exploring the data

2.2 Writing your first Airflow DAG

2.2.1 Tasks vs. operators

2.2.2 Running arbitrary Python code

2.3 Running a DAG in Airflow

2.3.1 Running Airflow in a Python environment

2.3.2 Running Airflow with Docker

2.3.3 Inspecting the DAG in Airflow

2.4 Running at regular intervals

2.5 Handling failing tasks

2.6 Dag Versioning

2.7 Summary