12 Operating Airflow in Production
This chapter covers:
- Dissecting the Airflow scheduler
- Configuring Airflow to scale horizontally using different executors
- Monitoring the status and performance of Airflow visually
- Sending out alerts in case of task failures
In most previous chapters, we focused on various parts of Airflow from a programmer’s perspective. In this chapter, we aim at exploring Airflow from an operations perspective. A general understanding of concepts such as (distributed) software architecture, logging, monitoring, and alerting is assumed. However, no specific technology is required.
NOTE
Throughout this chapter we often refer to the Airflow configuration. Configuration in Airflow is interpreted in this order of preference:
1. Environment variable (AIRFLOW__[SECTION]__[KEY])
2. Command environment variable (AIRFLOW__[SECTION]__[KEY]_CMD)
3. In airflow.cfg
4. Command in airflow.cfg
5. Default value
Whenever referring to configuration options, we will demonstrate option #1. For example, take the configuration item web_server_port in section webserver. This will be demonstrated as “AIRFLOW__WEBSERVER__WEB_SERVER_PORT”.
12.1 Airflow Architectures
At the very minimum, Airflow consists of three components:
- Webserver
- Scheduler
- Database
Figure 12.1 The most basic Airflow architecture