Chapter 49. Late-arriving dimensions in SSIS
SQL Server Integration Services (SSIS) is a great tool for populating data warehouses. Straightforward, common approaches in data warehousing are easy to implement. But certain scenarios can be more complex to implement using SSIS, particularly if you’re moving a lot of data and have aggressive performance requirements. Late-arriving dimensions are one of those scenarios.
Late-arriving dimensions (sometimes called early-arriving facts) occur when you have dimension data arriving in the data warehouse later than the fact data that references that dimension. This chapter discusses several options for handling late-arriving dimensions in SSIS.
In the typical case for a data warehouse, extract, transform, and load (ETL) processes are written so that dimensions are fully processed first. Once the dimension processing has been completed, the facts are loaded, with the assumption that all required dimension data is already in place. Late-arriving dimensions break this pattern, because the fact data is processed first, before the associated dimension information. The ETL processes can encounter fact records that are related to new dimension members—members that have yet to be processed.