2 Anatomy of an Airflow DAG
This chapter covers
- Running Airflow on your own machine
- Writing and running your first workflow
- Examining the first view in the Airflow interface
- Handling failed tasks in Airflow
By now, you have a decent overview-level understanding of what data pipelines are and how Airflow can help you manage them. To get a feeling for how this works in practice, let’s get our hands dirty on a small example pipeline that demonstrates the basic building blocks of many workflows.
2.1 Collecting data from numerous sources
Rockets are among humanity’s engineering marvels, and every rocket launch attracts attention around the world. Our friend John is a rocket enthusiast who tracks and follows every rocket launch. News about rocket launches appears in many news sources that John keeps track of, and ideally, he’d like all his rocket news aggregated in a single location. John recently picked up programming and wants an automated way to collect information about all rocket launches and eventually gain some personal insight into the latest rocket news. To start small, he decided to collect images of rockets first.
For the data, we’ll use Launch Library 2 (https://thespacedevs.com/llapi), an online repository of data about both historical and future rocket launches from various sources. It’s a free API, open to anybody on the planet (subject to rate limits).