18 Airflow in GCP


This chapter covers

  • Designing a deployment strategy for GCP
  • An overview of several GCP-specific hooks and operators
  • Demonstrating how to use GCP-specific hooks and operators

The last major cloud provider, Google Cloud Platform (GCP), is actually the best supported cloud platform in terms of the number of hooks and operators. Almost all Google services can be controlled with Airflow. In this chapter, we’ll dive into setting up Airflow on GCP (section 18.1), operators and hooks for GCP services (section 18.2), and the same use case as demonstrated on AWS and Azure, applied to GCP (section 18.3).

We must also note that GCP features a managed Airflow service named “Cloud Composer,” which is mentioned in more detail in section 15.3.2. This chapter covers a DIY Airflow setup on GCP, not Cloud Composer.

18.1 Deploying Airflow in GCP

GCP provides various services for running software. There is no one-size-fits-all approach, which is why Google (and all other cloud vendors) provide different services for running software.

18.1.1 Picking services

These services can be mapped on a scale, ranging from fully self-managed with the most flexibility, to managed completely by GCP with no maintenance required (figure 18.1).

Figure 18.1 Overview of the different compute services available in the Google Cloud Platform

18.1.2 Deploying on GKE with Helm

18.1.3 Integrating with Google services

18.1.4 Designing the network

18.1.5 Scaling with the CeleryExecutor

18.2 GCP-specific hooks and operators

18.3 Use case: Serverless movie ranking on GCP

18.3.1 Uploading to GCS

18.3.2 Getting data into BigQuery

18.3.3 Extracting top ratings

