chapter twelve

12 Operating Airflow in production

 

This chapter covers:

  • Dissecting the Airflow scheduler
  • Configuring Airflow to scale horizontally using different executors
  • Monitoring the status and performance of Airflow visually
  • Sending out alerts in case of task failures

In most previous chapters, we focused on various parts of Airflow from a programmer’s perspective. In this chapter, we aim at exploring Airflow from an operations perspective. A general understanding of concepts such as (distributed) software architecture, logging, monitoring, and alerting is assumed. However, no specific technology is required.

NOTE Throughout this chapter, we often refer to the Airflow configuration. Configuration in Airflow is interpreted in this order of preference:

  1. Environment variable (AIRFLOW__[SECTION]__[KEY])
  2. Command environment variable (AIRFLOW__[SECTION]__[KEY]_CMD)
  3. In airflow.cfg
  4. Command in airflow.cfg
  5. Default value

Whenever referring to configuration options, we will demonstrate option #1. For example, take the configuration item web_server_port in section webserver. This will be demonstrated as “AIRFLOW__WEBSERVER__WEB_SERVER_PORT”.

To find the current value of any configuration item, you can scroll down in the Connections page in the Airflow UI, menu Admin, down to the table “Running Configuration”. This table shows all configuration options, their current value, and from which of the five options above the configuration option was set.

12.1  Airflow Architectures

12.1.1    Which executor is right for me?

12.1.2    Configuring a metastore for Airflow

12.1.3    A closer look at the scheduler

12.2  Installing each executor

12.2.1    Setting up the SequentialExecutor

12.2.2    Setting up the LocalExecutor

12.2.3    Setting up the CeleryExecutor

12.2.4    Setting up the KubernetesExecutor

12.3  Capturing logs of all Airflow processes

12.3.1    Capturing the webserver output

12.3.2    Capturing the scheduler output

12.3.3    Capturing task logs

12.3.4    Sending logs to remote storage

12.4  Visualizing and monitoring Airflow metrics

12.7  Summary