One strong feature of Airflow is that it can be easily extended to coordinate jobs across many different types of systems. We have already seen some of this functionality in earlier chapters, where we were able to execute a job on for training a machine learning model on Amazon’s SageMaker service using the S3CopyObjectOperator
, but you can (for example) also use Airflow to run jobs on an ECS (Elastic Container Service) cluster in AWS using the ECSOperator
to perform queries on a Postgres database with the PostgresOperator
, and much more.
However, at some point, you may want to execute a task on a system that is not supported by Airflow, or you may have a task that you can implement using the PythonOperator
but that requires a lot of boilerplate code, which prevents others from easily reusing your code across different DAGs. How should you go about this?