In chapter 2, we learned that the mean and covariance functions of a Gaussian process (GP) act as prior information that we’d like to incorporate into the model when making predictions. For this reason, the choice for these functions greatly affects how the trained GP behaves. Consequently, if the mean and covariance functions are misspecified or inappropriate for the task at hand, the resulting predictions won’t be useful.
As an example, remember that a covariance function, or kernel, expresses the correlation—that is, similarity—between two points. The more similar the two points are, the more likely they are to have similar values for the labels we’re trying to predict. In our housing price prediction example, similar houses are likely to go for similar prices.
How does a kernel exactly compute the similarity between any two given houses? Let’s consider two cases. In the first, a kernel only considers the color of the front door and outputs 1 for any two houses of the same door color and 0 otherwise. In other words, this kernel thinks two houses are similar if and only if they have the same color for their front doors.