chapter ten

10 Testing

 

This chapter covers

  • Testing Airflow tasks in a CI/CD pipeline
  • Structuring a project for testing with pytest
  • Testing individual operators
  • Faking external system events with mocking
  • Using containers to test behavior in external systems

Up to now, we’ve focused on building data pipelines with Airflow. But how do you ensure that the code you’ve written is valid before deploying it into a production system? As in any software development process, testing your DAGs is a crucial step toward ensuring that they function correctly in both normal situations (the “happy” flow) and edge cases.

In this chapter, we explore testing in Airflow. This topic is often regarded as a tricky one due to the interconnectedness of data pipelines with external systems, which can make pipelines difficult to test. That’s no excuse to skip writing tests, however. We’ll show you how to write effective tests for your pipelines, resulting in more reliable systems.

10.1 Getting started with testing

10.1.1 Integrity testing all DAGs

10.1.2 Setting up a CI/CD pipeline

10.1.3 Writing unit tests

10.1.4 Creating the pytest project structure

10.1.5 Testing with files on disk

10.2 Working with external systems

10.3 Using tests for development

10.4 Testing complete DAGs

10.4.1 Using dag.test() to test the whole DAG

10.4.2 Emulating production environments with Whirl

10.4.3 Creating DTAP environments

Summary