12 Ingesting Data Incrementally
This chapter covers
- Comparing data ingestion approaches
- Preserving history with slowly changing dimensions
- Detecting changes with Snowflake streams
- Maintaining data with dynamic tables
- Querying historical data
In previous chapters, we built data pipelines that handled small amounts of data, and we didn't consider performance or regular pipeline execution. However, in real-world scenarios, data engineers usually deal with large data volumes that require additional considerations in pipeline design, such as avoiding processing all data every time the pipeline executes. One way to limit the processed data volume during pipeline execution is to ingest data incrementally.
Incremental data ingestion is faster than full ingestion as it involves moving less data, resulting in lower storage and compute costs. Virtual warehouses require less time to process the data and consume fewer credits. Additionally, the intermediate data pipeline layers store less data.