2 Gaussian processes as distributions over functions
This chapter covers
- A crash course on multivariate Gaussian distributions and their properties
- Understanding Gaussian processes as multivariate Gaussian distributions in infinite dimensions
- Implementing Gaussian processes in Python
Having seen what Bayesian optimization can help us do, we are now ready to embark on our journey toward mastering Bayesian optimization. As we saw in Chapter 1, a Bayesian optimization workflow consists of two main parts: a Gaussian process (GP) as a predictive, or surrogate, model, and a policy for decision-making. With a GP, we don’t obtain only point-estimates as predictions for a test data point, but instead we have an entire probability distribution representing our belief about the prediction.
On a high level, a GP, like any other machine learning model, operates under the tenet that similar data points produce similar predictions. For example, in weather forecasting, when estimating today’s temperature, a GP will look at the climatic data of days that are similar to today, either the last few days, or this exact day a year ago. Days that are in another season wouldn’t inform the GP in this prediction. Similarly, when predicting the price of a house, a GP will say that similar houses that are in the same neighborhood as the prediction target are more informative than houses in another state.