3 Time-based scheduling
This chapter covers
- Running DAGs at regular or irregular points in time
- Processing data incrementally using data intervals
- Loading and reprocessing previously processed data using backfilling
- Applying best practices to enhance task reliability
In the first two chapters, we explored Airflow’s UI and learned how to define a basic Airflow directed acyclic graph (DAG) and run it every day by defining a scheduled interval. In this chapter, we’ll dive a bit deeper into scheduling in Airflow and explore how it allows us to process data incrementally at regular intervals. First, we’ll introduce a small use case scenario focused on analyzing user events from our website and explore how to build a DAG to analyze these events at regular points in time. Next, we’ll explore ways to make this process more efficient by taking an incremental approach to analyzing our data and seeing how it ties into Airflow’s concept of schedule intervals. We’ll also look at a scheduling option based on specific event times. Finally, we’ll show how to fill gaps in our data set by using backfilling and discuss important properties of proper Airflow tasks.