Chapter 46. Using correlation to improve query performance
SQL Server doesn’t keep statistics on the correlation between nonclustered indexes and the clustered index (with the exception of correlation information between datetime columns, if the DATE_CORRELATION_OPTIMIZATION setting is turned on). Instead, the optimizer assumes it has a low correlation; it assumes that a range of nonclustered index values is scattered all over the clustered index.
This assumption affects the optimizer’s decision whether or not to use the nonclustered index. If there is a high correlation, the optimizer will overestimate the cost of using the nonclustered index, which can cause it to disqualify the index from the query plan evaluation, resulting in a suboptimal query plan. The performance difference can be big, even by orders of magnitude.
This chapter explains what it means to have a high correlation with the clustered index, why the optimizer can misjudge such situations, how to determine the correlation for your situation, and how to optimize your queries accordingly.
Note
In this chapter, I’ll assume that you’re familiar with indexes, and more specifically with the differences between clustered and nonclustered indexes. I’ll also assume that you have some experience with query tuning.
It’s the optimizer’s job to find the best query plan for a query. What is best? For the optimizer, it’s the one that results in the shortest execution time.