So far, we have seen that GPs offer great modeling flexibility. In chapter 3, we learned that we can model high-level trends using the GP’s mean function as well as variability using the covariance function. A GP also provides calibrated uncertainty quantification. That is, the predictions for datapoints near observations in the training dataset have lower uncertainty than those for points far away. This flexibility sets the GP apart from other ML models that produce only point estimates, such as neural networks. However, it comes at a cost: speed.
Training and making predictions with a GP (specifically, computing the inverse of the covariance matrix) scales cubically with respect to the size of the training data. That is, if our dataset doubles in size, a GP will take eight times as long to train and predict. If the dataset increases tenfold, it will take a GP 1,000 times longer. This poses a challenge to scaling GPs to large datasets, which are common in many applications: