chapter eight

8 Communicating with external systems

 

This chapter covers

  • Working with Airflow operators performing actions on systems outside Airflow
  • Applying operators specific to external systems
  • Implementing Airflow operators to perform A-to-B operations
  • Testing tasks connecting to external systems

In previous chapters, we used mainly generic operators such as the BashOperator and the PythonOperator to keep the focus on understanding the basics of Airflow. This is hardly the best use of Airflow, however. Airflow’s main power lies in its capability to connect to a broad variety of systems (e.g., an Apache Spark cluster, a Google BigQuery data warehouse, and a PostgreSQL database) and orchestrate workloads between them.

To demonstrate, this chapter explores how to install and use additional operators from the Airflow ecosystem to integrate with external systems without having to write custom integration logic. For illustration, we’ll develop two use cases connecting to different external systems and see how specific operators help us move and transform data between these systems.

8.1 Installing additional operators

8.2 Developing a machine learning model

8.2.1 Use case: Classifying handwritten digits

8.2.2 Setting up the pipeline

8.2.3 Developing locally with external systems

8.3 Moving data from between systems

8.3.1 Use case: Analyzing Airbnb listings

8.3.2 Implementing a PostgresToS3Operator

8.3.3 Outsourcing the heavy work

Summary