3 Time-based scheduling in Airflow
This chapter covers
- Running DAGs at regular or irregular points in time
- Processing data incrementally using data intervals
- Loading and reprocessing previously processed data using backfilling
- Applying best practices to enhance task reliability
- Triggering DAGs based on data updates with Data Assets.
Previously, we explored Airflow’s UI and showed you how to define a basic Airflow DAG and run it every day by defining a scheduled interval. Now, we will dive a bit deeper into the concept of scheduling in Airflow and explore how this allows you to process data incrementally at regular intervals. First, we’ll introduce a small use case scenario focused on analyzing user events from our website and explore how we can build a DAG to analyze these events at regular points in time. Next, we’ll explore ways to make this process more efficient by taking an incremental approach to analyzing our data and understanding how this ties into Airflow’s concept of schedule intervals. We’ll also look at a scheduling option based on specific event times. Finally, we’ll dive into how we can fill in past gaps in our data set using backfilling and discuss some important properties of proper Airflow tasks.