8 Building custom components


This chapter covers

  • Making your DAGs more modular and succinct with custom components
  • Designing and implementing a custom hook
  • Designing and implementing a custom operator
  • Designing and implementing a custom sensor
  • Distributing your custom components as a basic Python library

One strong feature of Airflow is that it can be easily extended to coordinate jobs across many different types of systems. We have already seen some of this functionality in earlier chapters, where we were able to execute a job on for training a machine learning model on Amazon’s SageMaker service using the S3CopyObjectOperator, but you can (for example) also use Airflow to run jobs on an ECS (Elastic Container Service) cluster in AWS using the ECSOperator to perform queries on a Postgres database with the PostgresOperator, and much more.

However, at some point, you may want to execute a task on a system that is not supported by Airflow, or you may have a task that you can implement using the PythonOperator but that requires a lot of boilerplate code, which prevents others from easily reusing your code across different DAGs. How should you go about this?

8.1 Starting with a PythonOperator

8.1.1 Simulating a movie rating API

8.1.2 Fetching ratings from the API

8.1.3 Building the actual DAG

8.2 Building a custom hook

8.2.1 Designing a custom hook

8.2.2 Building our DAG with the MovielensHook

8.3 Building a custom operator

8.3.1 Defining a custom operator

8.3.2 Building an operator for fetching ratings

8.4 Building custom sensors

8.5 Packaging your components

8.5.1 Bootstrapping a Python package

8.5.2 Installing your package