12 Best practices
This chapter covers
- Writing clean, understandable DAGs
- Generating DAGs and tasks with factory functions
- Designing idempotent and deterministic DAGs
- Handling data efficiently in your DAGs
- Managing concurrency with resource pools
By now, we’ve described most of the basic elements that go into building and designing data processes using Airflow DAGs. In this chapter, we’ll dive a bit deeper into some best practices that can help you write well-architected DAGs that are both easy to understand and efficient in terms of how they handle your data and resources.
12.1 Writing clean DAGs
Writing DAGs can easily become a messy business. DAG code can quickly become overly complicated or difficult to read, for example, especially if the DAGs were written by team members who have different styles of programming. In this section, we touch on some tips to help you structure and style your DAG code. We hope we can provide some often-needed clarity for your intricate data processes.
12.1.1 Using style conventions
As in all programming exercises, one of the first steps in writing clean, consistent DAGs is adopting a common, clean programming style and applying it consistently across all your DAGs. Although a thorough exploration of clean coding practices is well beyond the scope of this book, we can provide several tips as starting points.