4 Asset-aware scheduling
This chapter covers
- Splitting DAGs into producer and consumer DAGs
- Defining dependencies between DAGs using assets
- Updating assets in producer DAGs and triggering consumer DAGs
- Passing information between producers and consumers
- Defining complex dependencies on multiple assets
In chapter 3, we focused on time-based scheduling for tasks that are executed at predefined times or intervals. This method works well in many situations but can be problematic for scaling beyond individual directed acyclic graphs (DAGs). In this chapter, we’ll dive into an alternative event-driven approach called asset-aware scheduling, which explicitly models dependencies between DAGs as assets and triggers DAGs whenever the assets they depend on are updated.
4.1 Challenges of scaling time-based schedules
In chapter 3, we scheduled a DAG to ingest user events from an API at regular time intervals. But what happens when multiple teams—perhaps for analytics, marketing, and performance monitoring—need access to the same data?
We could allow each team to build its own pipeline to fetch the data (figure 4.1A). But this approach introduces several problems: