10 Best Practices
This chapter covers:
- Writing clean, understandable DAGs using style conventions
- Creating consistent approaches for managing credentials and configuration options
- Generating repeated DAGs and task structures using factory functions and DAG/task configurations
- Designing reproducible tasks by enforcing idempotency and determinism constraints, optionally using approaches inspired by functional programming
- Handling data efficiently by limiting the amount of data processed in your DAG, as well as using efficient approaches for handling/storing (intermediate) datasets
- Managing the resources of your (big) data processes by processing data in the most appropriate systems, whilst managing concurrency using resource pools
In previous chapters, we have described most of the basic elements that go into building and designing data processes using Airflow DAGs. In this chapter, we dive a bit deeper into some best practices that can help you write well architected DAGs that are both easy-to-understand and efficient in terms of how they handle your data and resources.