10 Running tasks in containers

This chapter covers

Identifying some challenges involved in managing Airflow deployments
Examining how containerized approaches can help simplify Airflow deployments
Running containerized tasks in Airflow on Docker
Establishing a high-level overview of workflows in developing containerized DAGs

In previous chapters, we implemented several DAGs using different Airflow operators, each specialized to perform a specific type of task. In this chapter, we touch on some of the drawbacks of using many different operators, especially with an eye on creating Airflow DAGs that are easy to build, deploy, and maintain. In light of these issues, we look at how we can use Airflow to run tasks in containers using Docker and Kubernetes and some of the benefits this containerized approach can bring.

10.1 Challenges of many different operators

Operators are arguably one of the strong features of Airflow, as they provide great flexibility to coordinate jobs across many different types of systems. However, creating and managing DAGs with many different operators can be quite challenging due to the complexity involved.

10.1.1 Operator interfaces and implementations

10.1.2 Complex and conflicting dependencies

10.1.3 Moving toward a generic operator

10.2 Introducing containers

10.2.1 What are containers?

10.2.2 Running our first Docker container

10.2.3 Creating a Docker image

10.2.4 Persisting data using volumes

10.3 Containers and Airflow

10.3.1 Tasks in containers

10.3.2 Why use containers?

10.4 Running tasks in Docker

10.4.1 Introducing the DockerOperator

10.4.2 Creating container images for tasks

10.4.3 Building a DAG with Docker tasks

10.4.4 Docker-based workflow

10.5 Running tasks in Kubernetes

Summary