chapter twelve

12 Best practices

 

This chapter covers

  • Writing clean, understandable DAGs
  • Generating DAGs and tasks with factory functions
  • Designing idempotent and deterministic DAGs
  • Handling data efficiently in your DAGs
  • Managing concurrency with resource pools

By now, we’ve described most of the basic elements that go into building and designing data processes using Airflow DAGs. In this chapter, we’ll dive a bit deeper into some best practices that can help you write well-architected DAGs that are both easy to understand and efficient in terms of how they handle your data and resources.

12.1 Writing clean DAGs

Writing DAGs can easily become a messy business. DAG code can quickly become overly complicated or difficult to read, for example, especially if the DAGs were written by team members who have different styles of programming. In this section, we touch on some tips to help you structure and style your DAG code. We hope we can provide some often-needed clarity for your intricate data processes.

12.1.1 Using style conventions

As in all programming exercises, one of the first steps in writing clean, consistent DAGs is adopting a common, clean programming style and applying it consistently across all your DAGs. Although a thorough exploration of clean coding practices is well beyond the scope of this book, we can provide several tips as starting points.

Following style guides

12.1.2 Managing credentials centrally

12.1.3 Specifying configuration details consistently

12.1.4 Avoiding computation in your DAG definition

12.1.5 Using factories to generate common patterns

12.1.6 Grouping related tasks with task groups

12.1.7 Being explicit when specifying your DAG schedule

12.1.8 Using Dynamic Task Mapping to generate tasks dynamically

12.2 Designing reproducible tasks

12.2.1 Requiring tasks to be idempotent

12.2.2 Ensuring that task results are deterministic

12.2.3 Designing tasks using functional paradigms