chapter eleven

11 Running tasks in containers

This chapter covers

Identifying some challenges involved in managing Airflow deployments
Examining how containerized approaches can help simplify Airflow deployments
Running containerized tasks in Airflow on Docker
Establishing a high-level overview of workflows in developing containerized DAGs

Previously, we implemented several DAGs using different Airflow operators, each specialized to perform a specific type of task. Although operators are powerful tools, they can also pose challenges for deploying and maintaining your DAGs if you use a wide variety of different operators across your pipelines. Here, we explore some of these challenges and look at how a containerized workflow using Docker and/or Kubernetes can simplify your workflow.

11.1 Challenges of many different operators

Operators are arguably one of the strong features of Airflow, as they provide great flexibility to coordinate jobs across many different types of systems. However, creating and managing DAGs with many different operators can be quite challenging due to the complexity involved.

11.1.1 Operator interfaces and implementations

11.1.2 Complex and conflicting dependencies

11.1.3 Moving toward a generic operator

11.2 Introducing containers

11.2.1 What are containers?

11.2.2 Running our first Docker container

11.2.3 Creating a Docker image

11.2.4 Persisting data using volumes

11.3 Containers and Airflow

11.3.1 Tasks in containers

11.3.2 Why use containers?

11.4 Running tasks in Docker

11.4.1 Introducing the DockerOperator

11.4.2 Creating container images for tasks

11.4.3 Building a DAG with Docker tasks

11.4.4 Docker-based workflow

11.5 Running tasks in Kubernetes

11.5.1 Introducing Kubernetes

11.5.2 Setting up Kubernetes

11.5.3 Using the KubernetesPodOperator

11.5.4 Diagnosing Kubernetes-related issues

11.5.5 Differences with Docker-based workflows

11.6 Summary