13 Orchestrating Data Pipelines
This chapter covers
- Orchestrating data pipelines with Snowflake tasks
- Sending notifications from tasks
- Orchestrating with task graphs
- Monitoring data pipeline execution
- Troubleshooting data pipeline failures
Data pipelines are a series of steps that perform data ingestion and transformation. They are usually scheduled to run at pre-defined times, often at night, to ensure that business users have fresh data every morning. If users need more recent data, data engineers can schedule the pipelines to run more frequently, such as every hour or every few minutes.
Since data pipelines involve many steps, data engineers must ensure the steps are executed in the correct sequence. Data engineers must also have visibility into the data pipeline execution, including how long it took, how much data it ingested, and whether it finished successfully. The process that involves scheduling, defining dependencies, error handling, and sending notifications to ensure efficient execution of data pipeline steps is called data pipeline orchestration.