chapter forty six

Chapter 46. Using correlation to improve query performance

Gert-Jan Strik

SQL Server doesn’t keep statistics on the correlation between nonclustered indexes and the clustered index (with the exception of correlation information between datetime columns, if the DATE_CORRELATION_OPTIMIZATION setting is turned on). Instead, the optimizer assumes it has a low correlation; it assumes that a range of nonclustered index values is scattered all over the clustered index.

This assumption affects the optimizer’s decision whether or not to use the nonclustered index. If there is a high correlation, the optimizer will overestimate the cost of using the nonclustered index, which can cause it to disqualify the index from the query plan evaluation, resulting in a suboptimal query plan. The performance difference can be big, even by orders of magnitude.

This chapter explains what it means to have a high correlation with the clustered index, why the optimizer can misjudge such situations, how to determine the correlation for your situation, and how to optimize your queries accordingly.

Note

In this chapter, I’ll assume that you’re familiar with indexes, and more specifically with the differences between clustered and nonclustered indexes. I’ll also assume that you have some experience with query tuning.

The purpose of the optimizer

It’s the optimizer’s job to find the best query plan for a query. What is best? For the optimizer, it’s the one that results in the shortest execution time.

Chapter 46. Using correlation to improve query performance

Note

The purpose of the optimizer

Correlation with the clustered index

When the optimizer does it right

When the optimizer does it right again

When the optimizer gets it wrong

Correcting the optimizer

When to expect correlation

Determining correlation

Summary

About the author