13 Orchestrating Data Pipelines

 

This chapter covers

  • Orchestrating data pipelines with Snowflake tasks
  • Sending notifications from tasks
  • Orchestrating with task graphs
  • Monitoring data pipeline execution
  • Troubleshooting data pipeline failures

Data pipelines are a series of steps that perform data ingestion and transformation. They are usually scheduled to run at pre-defined times, often at night, to ensure that business users have fresh data every morning. If users need more recent data, data engineers can schedule the pipelines to run more frequently, such as every hour or every few minutes.

Since data pipelines involve many steps, data engineers must ensure the steps are executed in the correct sequence. Data engineers must also have visibility into the data pipeline execution, including how long it took, how much data it ingested, and whether it finished successfully. The process that involves scheduling, defining dependencies, error handling, and sending notifications to ensure efficient execution of data pipeline steps is called data pipeline orchestration.

13.1 Orchestrating with Snowflake Tasks

 
 
 
 

13.1.1 Creating a Schema to Store the Orchestration Objects

 
 

13.1.2 Designing the Orchestration Tasks

 
 
 

13.1.3 Creating Tasks with Dependencies

 

13.2 Sending Email Notifications

 
 

13.3 Orchestrating with Task Graphs

 
 
 
 

13.3.1 Designing the Task Graph

 
 
 

13.3.2 Creating the Root Task

 
 
 
 

13.3.3 Creating the Finalizer Task

 
 

13.3.4 Viewing the Task Graph

 
 

13.4 Monitoring Data Pipeline Execution

 
 
 
 

13.4.1 Adding Logging Functionality to Tasks

 
 

13.4.2 Summarizing Logging Information in an Email Notification

 
 
 
 

13.5 Troubleshooting Data Pipeline Failures

 
 
 

13.6 Summary

 
 
sitemap

Unable to load book!

The book could not be loaded.

(try again in a couple of minutes)

manning.com homepage
test yourself with a liveTest