5 Exploring the search space with bandit-style policies

 

This chapter covers

  • The multi-armed bandit problem and how it’s related to BayesOpt
  • The Upper Confidence Bound policy in BayesOpt
  • The Thompson sampling policy in BayesOpt

Which slot machine should you play at a casino to maximize your winnings? How can you develop a strategy to intelligently try out multiple slot machines and narrow down the most profitable machine? What does this problem have to do with BayesOpt? These are the questions this chapter will help us answer.

Chapter 4 was our introduction to BayesOpt policies, which decide how the search space should be explored and inspected. The exploration strategy of a BayesOpt policy should guide us toward the optimum of the objective function we’d like to optimize. The two particular policies we learned about were Probability of Improvement (PoI) and Expected Improvement (EI), which use the idea that we’d like to improve from the best objective value we have seen so far. This improvement-based mindset is only a heuristic and, therefore, doesn’t constitute the only approach to BayesOpt.

5.1 Introduction to the MAB problem

5.1.1 Finding the best slot machine at a casino

5.1.2 From MAB to BayesOpt

5.2 Being optimistic under uncertainty with the Upper Confidence Bound policy

5.2.1 Optimism under uncertainty

5.2.2 Balancing exploration and exploitation

5.2.3 Implementation with BoTorch

5.3 Smart sampling with the Thompson sampling policy

5.3.1 One sample to represent the unknown

5.3.2 Implementation with BoTorch

5.4 Exercises

5.4.1 Exercise 1: Setting an exploration schedule for the UCB

5.4.2 Exercise 2: BayesOpt for hyperparameter tuning

Summary