chapter nine

9 Extending Airflow with custom operators and sensors

 

This chapter covers

  • Making DAGs more modular and concise with custom components
  • Designing and implementing custom hooks, operators, sensors, and deferrable sensors
  • Distributing custom components as a basic Python library

As we’ve seen, one strong feature of Airflow is the ecosystem of operators that allow you to coordinate jobs across many types of systems. At some point, however, you may want to execute a task on a system that Airflow doesn’t support. Or you may have a task that you can implement using the PythonOperator, but it requires a lot of boilerplate code, which prevents others from reusing your code easily across DAGs. How should you go about it?

Fortunately, Airflow makes it easy to create new operators to implement your custom operations so you can run jobs on otherwise unsupported systems or make common operations easy to apply across DAGs. In fact, many of the operators in Airflow were implemented because someone had to run a job on a certain system and built an operator to do it.

In this chapter, we’ll show you how to build your own operators and use them in DAGs. We’ll also explore how to package your custom components into a Python package, making them easy to install and reuse across environments.

9.1 Starting with a PythonOperator

9.1.1 Simulating a movie-rating API

9.1.2 Fetching ratings from the API

9.1.3 Building the actual DAG

9.2 Building a custom hook

9.2.1 Designing a custom hook

9.2.2 Building a DAG with the MovielensHook

9.3 Building a custom operator

9.3.1 Defining a custom operator

9.3.2 Building an operator to fetch ratings

9.4 Building custom sensors

9.5 Building a custom deferrable operator

9.5.1 Executing asynchronous tasks using the triggerer

9.5.2 Running the Movielens sensor asynchronously

9.6 Packaging the components

9.6.1 Bootstrapping a Python package