chapter eleven

11 Running tasks in containers

 

This chapter covers

  • Identifying challenges in managing Airflow deployments
  • Examining how containerized approaches can help simplify Airflow deployments
  • Running containerized tasks in Airflow on Docker
  • Establishing a high-level overview of workflows in developing containerized DAGs

Previously, we implemented several DAGs using different Airflow operators, each specialized to perform a specific type of task. Although operators are powerful tools, they can also pose challenges in deploying and maintaining your DAGs if you use a wide variety of operators across your pipelines. This chapter explores some of these challenges and examines how a containerized workflow using Docker and/or Kubernetes can simplify your workflow.

11.1 Challenges of different operators

Operators are arguably some of the strongest features of Airflow because they give you great flexibility in coordinating jobs across different types of systems. But creating and managing DAGs with many operators can be quite challenging due to the complexity involved.

11.1.1 Operator interfaces and implementations

11.1.2 Complex and conflicting dependencies

11.1.3 Moving toward a generic operator

11.2 Introducing containers

11.2.1 What are containers?

11.2.2 Running a first Docker container

11.2.3 Creating a Docker image

11.2.4 Persisting data using volumes

11.3 Containers and Airflow

11.3.1 Tasks in containers

11.3.2 Why use containers?

11.4 Running tasks in Docker

11.4.1 Introducing the DockerOperator

11.4.2 Creating container images for tasks

11.4.3 Building a DAG with Docker tasks

11.4.4 Docker-based workflow