Chapter 56. Incorporating data profiling in the ETL process
When you work with data on a regular basis, it’s not uncommon to need to explore it. This is particularly true in the business intelligence and data warehousing field, because you may be dealing with new or changing data sources quite often. To work effectively with the data, you need to understand its profile—both what the data looks like at the detail level and from a higher aggregate level. In the case of small sets of data, you may be able to get a picture of the data profile by reviewing the detail rows. For larger sets of data, this isn’t practical, as there is too much information to easily hold in your head. Fortunately, SQL Server 2008 includes new functionality that makes this easier.
The data profiling tools in SQL Server Integration Services (SSIS) 2008 include a Data Profiling task and a Data Profile Viewer. The Data Profiling task is a new task for SSIS 2008. It can help you understand large sets of data by offering a set of commonly needed data profiling options. The Data Profile Viewer is an application that can be used to review the output of the Data Profiling task. Over the course of this chapter, I’ll introduce the Data Profiling task in the context of data warehousing and explain how it can be used to explore source data and how you can automate the use of the profile information in your extract, transform, and load (ETL) processes to make decisions about your data.