chapter fifteen

15 Operating Airflow in production

This chapter covers

Dissecting the Airflow components
Configuring Airflow to scale horizontally using different executors
Monitoring the status and performance of Airflow visually
Sending out alerts in case of task failures

Up to now, we’ve mainly focused on the user side of Airflow: building data pipelines. Going forward, we’ll switch gears and discuss what it takes to run and deploy Airflow from an operations perspective. A general understanding of concepts such as (distributed) software architecture, logging, monitoring, and alerting is assumed. However, no specific technology is required.

15.1 Revisiting the Airflow architecture

Back in chapter 1, we showed the Airflow architecture displayed in Figure 15.1.

At a minimum, Airflow consists of a few components:

API server
Scheduler
Dag Processor
Database (also known as Metastore)
Workers
Triggerer (optional component, required when working with deferrable operators)
Executor (not in the image)

Figure 15.1 High-level Airflow architecture

The API server and scheduler are both Airflow processes. The database is a separate service you must provide to Airflow for storing metadata from the API server and scheduler. A folder with DAG definitions must be accessible by the DAG Processor.

15.2 Choosing the executor

15.2.1 Overview of different executor types

15.2.2 Which executor is right for you?

15.2.3 Installing each executor

15.3 Configuring the metastore

15 Operating Airflow in production

This chapter covers

15.1 Revisiting the Airflow architecture

Figure 15.1 High-level Airflow architecture

15.2 Choosing the executor

15.2.1 Overview of different executor types

15.2.2 Which executor is right for you?

15.2.3 Installing each executor

15.3 Configuring the metastore

15.4 Configuring the scheduler

15.4.1 Configuring scheduler components

15.4.2 Running multiple schedulers

15.4.3 System performance configurations

15.4.4 Controlling the maximum number of running tasks

15.5 Configuring the DAG Processor Manager

15.6 Capturing logs

15.6.1 Capturing API server output

15.6.2 Capturing scheduler output

15.6.3 Capturing task logs

15.6.4 Sending logs to remote storage

15.7 Visualizing and monitoring Airflow metrics

15.7.1 Collecting metrics from Airflow

15.7.2 Configuring Airflow to send metrics

15.7.3 Configuring Prometheus to collect metrics

15.7.4 Creating dashboards with Grafana

15.7.5 What should you monitor?