18 Airflow in GCP

 

This chapter covers:

  • Designing a deployment strategy for GCP using GKE, Cloud Storage, and Google BigQuery.
  • An overview of several GCP-specific hooks and operators that allow you to integrate with commonly used GCP services.
  • Demonstrating how to use GCP-specific hooks and operators to build a simple serverless recommender system.

The last major cloud provider, Google Cloud Platform (GCP), is actually the best-supported cloud platform in terms of the number of hooks and operators. Almost all Google services can be controlled with Airflow. In this chapter, we’ll dive into setting up Airflow on GCP (18.1), operators and hooks for GCP services (18.2), and the same use case as demonstrated on AWS and Azure, applied to GCP (18.3).

We must also note that GCP features a managed Airflow service named “Cloud Composer”, which is mentioned in more detail in Section 15.3.2. This chapter covers a DIY Airflow setup on GCP, not Cloud Composer.

18.1  Deploying Airflow in GCP

GCP provides various services for running software. There is no one-size-fits-all, which is why Google (and all other cloud vendors) provide different services for running software.

18.1.1    Picking services

These services can be mapped on a scale, ranging from fully self-managed and the most flexibility, to managed completely by GCP and no maintenance required:

Figure 18.1 Overview of the different compute services available in the Google Cloud Platform.

18.1.2    Deploying on GKE with Helm

18.1.3    Integrating with Google services

18.1.4    Designing the network

18.1.5    Scaling with the CeleryExecutor

18.2  GCP-specific hooks and operators

18.3  Use case: serverless movie ranking on GCP

18.3.1    Uploading to GCS

18.3.2    Getting data into BigQuery

18.3.3    Extracting top ratings

18.4  Summary

sitemap