15 Airflow in the clouds

 

This chapter covers

  • Examining the components required to build Airflow cloud deployments
  • Introduction to cloud-specific hooks/operators for integrating with cloud services
  • Vendor-managed services as alternatives to rolling your own deployment

In this chapter, we’ll start exploring how to deploy and integrate Airflow in cloud environments. First, we’ll revisit the various components of Airflow and how these fit together in cloud deployments. We’ll use this breakdown to map each of the components to their cloud-specific counterparts in Amazon AWS (chapter 16), Microsoft Azure (chapter 17), and Google Cloud Platform (chapter 18). Then we’ll briefly introduce cloud-specific hooks/operators, which can be used to integrate with specific cloud services. We’ll also provide some managed alternatives for deploying Airflow and discuss several criteria you should consider when weighing rolling your own deployment versus using a vendor-managed solution.

15.1 Designing (cloud) deployment strategies

Before we start designing deployment strategies for Airflow in the different clouds (AWS, Azure and GCP), let’s start by reviewing the different components of Airflow (e.g., webserver, scheduler, workers) and what kind of (shared) resources these components will need access to (e.g., DAGs, log storage, etc.). This will help us later when mapping these components to the appropriate cloud services.

15.2 Cloud-specific operators and hooks

15.3 Managed services

15.3.1 Astronomer.io

15.3.2 Google Cloud Composer

15.3.3 Amazon Managed Workflows for Apache Airflow

15.4 Choosing a deployment strategy

Summary

sitemap