12 Operating Airflow in production
This chapter covers:
- Dissecting the Airflow scheduler
- Configuring Airflow to scale horizontally using different executors
- Monitoring the status and performance of Airflow visually
- Sending out alerts in case of task failures
In most previous chapters, we focused on various parts of Airflow from a programmer’s perspective. In this chapter, we aim at exploring Airflow from an operations perspective. A general understanding of concepts such as (distributed) software architecture, logging, monitoring, and alerting is assumed. However, no specific technology is required.
NOTE Throughout this chapter, we often refer to the Airflow configuration. Configuration in Airflow is interpreted in this order of preference:
- Environment variable (
AIRFLOW__[SECTION]__[KEY]) - Command environment variable (
AIRFLOW__[SECTION]__[KEY]_CMD) - In airflow.cfg
- Command in airflow.cfg
- Default value
Whenever referring to configuration options, we will demonstrate option #1. For example, take the configuration item web_server_port in section webserver. This will be demonstrated as “AIRFLOW__WEBSERVER__WEB_SERVER_PORT”.
To find the current value of any configuration item, you can scroll down in the Connections page in the Airflow UI, menu Admin, down to the table “Running Configuration”. This table shows all configuration options, their current value, and from which of the five options above the configuration option was set.