12 Ingesting Data Incrementally

 

This chapter covers

  • Comparing data ingestion approaches
  • Preserving history with slowly changing dimensions
  • Detecting changes with Snowflake streams
  • Maintaining data with dynamic tables
  • Querying historical data

In previous chapters, we built data pipelines that handled small amounts of data, and we didn't consider performance or regular pipeline execution. However, in real-world scenarios, data engineers usually deal with large data volumes that require additional considerations in pipeline design, such as avoiding processing all data every time the pipeline executes. One way to limit the processed data volume during pipeline execution is to ingest data incrementally.

Incremental data ingestion is faster than full ingestion as it involves moving less data, resulting in lower storage and compute costs. Virtual warehouses require less time to process the data and consume fewer credits. Additionally, the intermediate data pipeline layers store less data.

12.1 Comparing Data Ingestion Approaches

 
 
 

12.1.1 Full Ingestion

 
 
 
 

12.1.2 Incremental Ingestion

 
 
 

12.2 Preserving History with Slowly Changing Dimensions

 
 

12.2.1 Slowly Changing Dimensions Type 2

 

12.2.2 Append-only strategy

 
 

12.2.3 Designing Idempotent Data Pipelines

 
 

12.3 Detecting Changes with Snowflake Streams

 
 
 
 

12.3.1 Ingesting Files from Cloud Storage Incrementally

 
 
 

12.3.2 Preserving History when Ingesting Data Incrementally

 
 
 
 

12.4 Maintaining Data with Dynamic Tables

 
 
 

12.4.1 Deciding when to Use Dynamic Tables

 

12.4.2 Querying Historical Data

 
 
 

12.5 Summary

 
 
sitemap

Unable to load book!

The book could not be loaded.

(try again in a couple of minutes)

manning.com homepage
test yourself with a liveTest