chapter four

4 Refining the best result with improvement-based policies

This chapter covers

The BayesOpt loop
The tradeoff between exploitation and exploration in a BayesOpt policy
Improvement as a criterion for finding new data points
BayesOpt policies that use improvement

In this chapter, we first remind ourselves of the iterative nature of BayesOpt: we alternate between training a Gaussian process (GP) on the collected data and finding the next data point to label using a BayesOpt policy. This forms a virtuous cycle in which our past data inform future decisions. We then talk about what we look for in a BayesOpt policy: a decision-making algorithm that decides which data point to label. A good BayesOpt policy needs to balance sufficiently exploring the search space and zeroing in on the high-performing regions.

4.1 Navigating the search space in BayesOpt

4.1.1 The BayesOpt loop and policies

4.1.2 Balancing exploration and exploitation

4.2 Finding improvement in BayesOpt

4.2.1 Measuring improvement with a GP

4.2.2 Computing the Probability of Improvement

4.2.3 Running the PoI policy

4.3 Optimizing the expected value of improvement

4.4 Exercises

4.4.1 Exercise 1: Encouraging exploration with PoI

4.4.2 Exercise 2: BayesOpt for hyperparameter tuning

Summary