17 Airflow deployment options

 

This chapter covers

  • Vendor-managed services for Airflow
  • Rolling your own deployment in a Kubernetes cluster as alternative to using a managed service
  • Deployment options when deploying in a Kubernetes cluster

Until now we have explored Airflow in a single instance setup with docker compose. In chapter 15 we learned about some of the different configuration options available to operate Airflow in production. Now, we will focus on the deployment of Airflow. We’ll start exploring the vendor-managed solutions available and discuss several criteria you should consider when weighing rolling your own deployment versus using a vendor-managed solution. Finally, we focus on deploying Airflow in Kubernetes, looking at the various components of Airflow and how these fit together in these deployments. We’ll use this breakdown to guide you through some of the choices you as a Data Engineer can make when rolling your own deployment.

17.1 Managed Airflow

17.1.1 Astronomer

17.1.2 Google Cloud Composer

17.1.3 Amazon Managed Workflows for Apache Airflow

17.2 Airflow on Kubernetes

17.2.1 Preparing our Kubernetes cluster

17.2.2 Connecting to your Kubernetes cluster

17.2.3 Deploying with The Apache Airflow Helm chart

17.2.4 Changing the default deployment configuration

17.2.5 Changing the apiserver secret key

17.2.6 Using an external database for Airflow metadata

17.2.7 DAG deployment

17.2.8 Python library deployment

17.2.9 Configuring the Executor(s)

17.3 Choosing a deployment strategy

17.4 Summary