chapter thirteen

13 Airflow in the Clouds

 

This chapter covers:

  • Designing deployment strategies for several cloud platforms (Amazon AWS, Microsoft Azure and Google Cloud Platform).
  • Using cloud-specific operators and hooks to integrate with different services available in the respective cloud platforms.
  • Brief introduction to managed (cloud) services, which can provide an easier approach for managing Airflow deployments than rolling your own solution.

In this chapter we’ll dive into how Airflow can be used on several major cloud platforms (Amazon AWS, Microsoft Azure and Google Cloud Platform). First, we’ll do a short recap of the different parts of an Airflow deployment to get an overview of the involved components. Next, for each of the three clouds (Amazon AWS, Microsoft Azure and Google Cloud Platform), we’ll start designing several deployment strategies for each cloud by mapping the different Airflow components to appropriate cloud services. Afterwards, we’ll discuss and demonstrate how cloud-specific operators can be used in Airflow to leverage other cloud services from within DAGs. Finally, we’ll close off with a short overview of some managed cloud services, which provide an easier approach for rolling out Airflow deployments without having to manage all the underlying cloud services ourselves.

13.1  Designing (cloud) deployment strategies

13.2  AWS

13.2.1    Deploying in AWS

13.2.2    AWS-specific hooks and operators

13.2.3    Example: serverless movie ranking with AWS Athena

13.3  Azure

13.3.1    Deploying Airflow

13.3.2    Azure-specific hooks/operators

13.3.3    Example: serverless movie ranking with Azure Synapse

13.4  Google Cloud Platform

13.4.1    Deploying Airflow in GCP

13.4.2    GCP-specific hooks and operators

13.4.3    Example: serverless movie ranking on GCP

13.5  Managed services

13.6  Summary