Part 2. Making decisions with Bayesian optimization

 

The GP is only one part of the equation. To fully realize the BayesOpt technique, we need the second part of the equation: decision-making policies that dictate how function evaluations should be carried out to optimize the objective function as quickly as possible. This part enumerates the most popular BayesOpt policies, including their motivations, mathematical intuitions, and implementations. While different policies are motivated by different objectives, they all aim to balance the tradeoff between exploration and exploitation—a core challenge in BayesOpt, specifically, and decision-making under uncertainty problems, more generally.

Chapter 4 kicks things off by introducing the idea of an acquisition score as a way to quantify the value of making a function evaluation. The chapter also describes the heuristic of seeking to improve from the best point we have seen so far, which leads to two popular BayesOpt policies: Probability of Improvement and Expected Improvement.

Chapter 5 connects BayesOpt with a closely related problem: multi-armed bandits. We explore the popular Upper Confidence Bound policy, which uses the optimism under uncertainty heuristic, and the Thompson sampling policy, which uses the probabilistic nature of the GP to aid decision-making.