Chapter 49. Late-arriving dimensions in SSIS

 

John Welch

SQL Server Integration Services (SSIS) is a great tool for populating data warehouses. Straightforward, common approaches in data warehousing are easy to implement. But certain scenarios can be more complex to implement using SSIS, particularly if you’re moving a lot of data and have aggressive performance requirements. Late-arriving dimensions are one of those scenarios.

Late-arriving dimensions (sometimes called early-arriving facts) occur when you have dimension data arriving in the data warehouse later than the fact data that references that dimension. This chapter discusses several options for handling late-arriving dimensions in SSIS.

A late-arriving dimension scenario

In the typical case for a data warehouse, extract, transform, and load (ETL) processes are written so that dimensions are fully processed first. Once the dimension processing has been completed, the facts are loaded, with the assumption that all required dimension data is already in place. Late-arriving dimensions break this pattern, because the fact data is processed first, before the associated dimension information. The ETL processes can encounter fact records that are related to new dimension members—members that have yet to be processed.

Natural keys and surrogate keys

 
 
 

The example data structure

 
 

Working around late-arriving dimensions

 
 

Handling late-arriving dimension members in fact processing

 
 
 
 

Processing the dimension update

 
 
 

Summary

 
 
 

About the author

 
sitemap

Unable to load book!

The book could not be loaded.

(try again in a couple of minutes)

manning.com homepage
test yourself with a liveTest