chapter sixteen

16 Airflow on AWS

 

This chapter covers:

  • Designing a deployment strategy for AWS using ECS, S3, EFS and RDS services.
  • An overview of several AWS-specific hooks and operators that allow you to integrate with commonly used AWS services.
  • Demonstrating how to use AWS-specific hooks and operators to build a simple serverless recommender system.

After our brief introduction in the previous chapter, this chapter will dive further into how to deploy and integrate Airflow with cloud services in Amazon AWS. First, we’ll start by designing an Airflow deployment by mapping the different components of Airflow to AWS services. Afterwards, we’ll explore some of the hooks and operators that Airflow provides for integrating with several key AWS services. Finally, we’ll show how to actually use these AWS-specific operators and hooks to implement an actual use case for generating movie recommendations.

16.1  Deploying Airflow in AWS

In the previous chapter, we described the different components comprising an Airflow deployment. In this section, we’ll design a few deployment patterns for AWS by mapping these different components to specific AWS cloud services. This should hopefully give you a good idea of the process involved in designing an Airflow deployment for AWS and provide a good starting point for actually implementing one.

16.1.1    Picking cloud services

16.1.2    Designing the network

16.1.3    Adding DAG syncing

16.1.4    Scaling with the CeleryExecutor

16.1.5    Further steps

16.2  AWS-specific hooks and operators

16.3  Use case: serverless movie ranking with AWS Athena

16.3.1    Overview

16.3.2    Setting up resources

16.3.3    Building the DAG

16.3.4    Cleaning up

16.4  Summary