5 Exploring the search space with bandit-style policies
This chapter covers
- The multi-armed bandit problem and how it’s related to Bayesian optimization
- The Upper Confidence Bound policy in Bayesian optimization
- The Thompson sampling policy in Bayesian optimization
Which slot machine should you play at a casino to maximize your winnings? How can you develop a strategy to intelligently try out multiple slot machines and narrow down the most profitable machine? What does this problem have to do with Bayesian optimization (BayesOpt)? These are the questions this chapter will help us answer.
Chapter 4 was our introduction to BayesOpt policies, which decide how the search space should be explored and inspected. The exploration strategy of a BayesOpt policy should guide us towards the optimum of the objective function we’d like to optimize. The two particular policies we learned about were Probability of Improvement and Expected Improvement, which leverage the idea that we’d like to improve from the best objective value that we have seen so far. This improvement-based mindset is only a heuristic and therefore doesn’t constitute the only approach to BayesOpt.