In this chapter, we’ll start exploring how to deploy and integrate Airflow in cloud environments. First, we’ll revisit the various components of Airflow and how these fit together in cloud deployments. We’ll use this breakdown to map each of the components to their cloud-specific counterparts in Amazon AWS (chapter 16), Microsoft Azure (chapter 17), and Google Cloud Platform (chapter 18). Then we’ll briefly introduce cloud-specific hooks/operators, which can be used to integrate with specific cloud services. We’ll also provide some managed alternatives for deploying Airflow and discuss several criteria you should consider when weighing rolling your own deployment versus using a vendor-managed solution.
Before we start designing deployment strategies for Airflow in the different clouds (AWS, Azure and GCP), let’s start by reviewing the different components of Airflow (e.g., webserver, scheduler, workers) and what kind of (shared) resources these components will need access to (e.g., DAGs, log storage, etc.). This will help us later when mapping these components to the appropriate cloud services.