7 Communicating with external systems

 

This chapter covers

  • Working with Airflow operators performing actions on systems outside Airflow
  • Applying operators specific to external systems
  • Implementing operators in Airflow doing A-to-B operations
  • Testing tasks connecting to external systems

In all previous chapters, we’ve focused on various aspects of writing Airflow code, mostly demonstrated with examples using generic operators such as the BashOperator and PythonOperator. While these operators can run arbitrary code and thus could run any workload, the Airflow project also holds other operators for more specific use cases, for example, running a query on a Postgres database. These operators have one and only one specific use case, such as running a query. As a result, they are easy to use by simply providing the query to the operator, and the operator internally handles the querying logic. With a PythonOperator, you would have to write such querying logic yourself.

For the record, with the phrase external system we mean any technology other than Airflow and the machine Airflow is running on. This could be, for example, Microsoft Azure Blob Storage, an Apache Spark cluster, or a Google BigQuery data warehouse.

7.1 Connecting to cloud services

 
 
 
 

7.1.1 Installing extra dependencies

 
 
 
 

7.1.2 Developing a machine learning model

 
 
 

7.1.3 Developing locally with external systems

 
 
 
 

7.2 Moving data from between systems

 
 
 

7.2.1 Implementing a PostgresToS3Operator

 
 
 
 

7.2.2 Outsourcing the heavy work

 
 
 
 

Summary

 
 
sitemap

Unable to load book!

The book could not be loaded.

(try again in a couple of minutes)

manning.com homepage
test yourself with a liveTest