chapter eight

8 Communicating with external systems

 

This chapter covers

  • Working with Airflow operators performing actions on systems outside Airflow
  • Applying operators specific to external systems
  • Implementing Airflow operators to perform A-to-B operations
  • Testing tasks connecting to external systems

In previous chapters, we’ve mainly used generic operators such as the BashOperator and PythonOperator to keep the focus on understanding the basics of Airflow. However, this is hardly the best use of Airflow, as Airflow’s main power lies in the ability to connect to a broad variety of different systems (e.g. a Spark cluster, a BigQuery data warehouse, a Postgres database) and orchestrate workloads between them.

To demonstrate this, we’ll explore how to install and use additional operators from the Airflow ecosystem to integrate with external systems without having to write our own custom integration logic. For illustration, we’ll develop two use cases connecting to different external systems and see how specific operators help us move and transform data between these systems.

Tip

Operators are always under development. At the time of reading there might be new operators that suit your use case that were not described in this chapter.

8.1 Installing additional operators

8.2 Developing a machine learning model

8.2.1 Use case: classifying handwritten digits

8.2.2 Setting up the pipeline

8.2.3 Developing locally with external systems

8.3 Moving data from between systems

8.3.1 Use case: Analyzing Airbnb listings

8.3.2 Implementing a PostgresToS3Operator

8.3.3 Outsourcing the heavy work

8.4 Summary