chapter five

5 Exploring the search space with bandit-style policies

This chapter covers

The multi-armed bandit problem and how it’s related to Bayesian optimization
The Upper Confidence Bound policy in Bayesian optimization
The Thompson sampling policy in Bayesian optimization

Which slot machine should you play at a casino to maximize your winnings? How can you develop a strategy to intelligently try out multiple slot machines and narrow down the most profitable machine? What does this problem have to do with Bayesian optimization (BayesOpt)? These are the questions this chapter will help us answer.

Chapter 4 was our introduction to BayesOpt policies, which decide how the search space should be explored and inspected. The exploration strategy of a BayesOpt policy should guide us towards the optimum of the objective function we’d like to optimize. The two particular policies we learned about were Probability of Improvement and Expected Improvement, which leverage the idea that we’d like to improve from the best objective value that we have seen so far. This improvement-based mindset is only a heuristic and therefore doesn’t constitute the only approach to BayesOpt.

5.1 Introduction to the multi-armed bandit problem

5.1.1 Finding the best slot machine at a casino

5.1.2 From multi-armed bandit to Bayesian optimization

5.2 Being optimistic under uncertainty with the Upper Confidence Bound policy

5.2.1 Optimism under uncertainty

5.2.2 Balancing exploration and exploitation

5.2.3 Implementation with BoTorch

5.3 Smart sampling with the Thompson sampling policy

5.3.1 One sample to represent the unknown

5.3.2 Implementation with BoTorch

5.4 Summary

5.5 Exercise

5.5.1 Setting an exploration schedule for Upper Confidence Bound

5.5.2 Bayesian optimization for hyperparameter tuning