14 Operating Airflow in production

 

This chapter covers

  • Dissecting the Airflow scheduler
  • Configuring Airflow to scale horizontally using different executors
  • Monitoring the status and performance of Airflow visually
  • Sending out alerts in case of task failures

In most of the previous chapters, we focused on various parts of Airflow from a programmer’s perspective. In this chapter, we aim to explore Airflow from an operations perspective. A general understanding of concepts such as (distributed) software architecture, logging, monitoring, and alerting is assumed. However, no specific technology is required.

14.1 Revisiting the Airflow architecture

Back in chapter 1, we showed the Airflow architecture displayed in Figure 14.1.

At a minimum, Airflow consists of a few components:

  • Webserver
  • Scheduler
  • Database (also known as Metastore)
  • Workers
  • Triggerer (optional component, required when working with deferrable operators)
  • Executor (not in the image)
Figure 14.1 High-level Airflow architecture

The webserver and scheduler are both Airflow processes. The database is a separate service you must provide to Airflow for storing metadata from the webserver and scheduler. A folder with DAG definitions must be accessible by the scheduler.

The webserver’s responsibility is to visually display information about the status of the pipelines and allow the user to perform certain actions, such as triggering a DAG.

The scheduler’s responsibility is twofold:

14.2 Choosing the executor

14.2.1 Overview of different executor types

14.2.2 Which executor is right for you?

14.2.3 Installing each executor

14.3 Configuring the metastore

14.4 Configuring the scheduler

14.4.1 Configuring scheduler components

14.4.2 Running multiple schedulers

14.4.3 System performance configurations

14.4.4 Controlling the maximum number of running tasks

14.5 Capturing logs

14.5.1 Capturing webserver output

14.5.2 Capturing scheduler output

14.5.3 Capturing task logs

14.5.4 Sending logs to remote storage

14.6 Visualizing and monitoring Airflow metrics

14.6.1 Collecting metrics from Airflow

14.6.2 Configuring Airflow to send metrics

14.6.3 Configuring Prometheus to collect metrics

14.6.4 Creating dashboards with Grafana

14.6.5 What should you monitor?

14.7 Setting up alerts

14.7.1 Alerting within DAGs and operators

14.7.2 Defining service-level agreements

14.9 Summary