4 Asset-aware scheduling
This chapter covers
- Splitting DAGs into producer DAGs and consumer DAGs
- Defining dependencies between DAGs using assets
- Updating assets in producer DAGs and triggering consumer DAGs
- Passing information between producers and consumers
- Defining complex dependencies on multiple assets
In the previous chapter, we focused on time-based scheduling, where tasks are executed at predefined times or intervals. This method works well for many situations but can be problematic when scaling beyond individual DAGs. Now, we’ll dive into an alternative event-driven approach called “asset-aware scheduling” that explicitly models dependencies between DAGs as “assets” and triggers DAGs whenever the assets they depend on are updated.
4.1 Challenges of scaling time-based schedules
Previously, we scheduled a DAG to ingest user events from an API at a regular time interval. But what happens when multiple teams—perhaps for analytics, marketing, and performance monitoring—all need access to the same data?