chapter six

6 Using information theory with entropy-based policies

This chapter covers

Entropy as an information-theoretic measure of uncertainty
Information gain as a method of reducing entropy
BayesOpt policies that use information theory for their search

We saw in chapter 4 that by aiming to improve from the best value achieved so far, we can design improvement-based BayesOpt policies, such as Probability of Improvement (POI) and Expected Improvement (EI). In chapter 5, we used multi-armed bandit (MAB) policies to obtain Upper Confidence Bound (UCB) and Thompson sampling (TS), each of which uses a unique heuristic to balance exploration and exploitation in the search for the global optimum of the objective function.

In this chapter, we learn about another heuristic to decision-making, this time using information theory to design BayesOpt policies we can use in our optimization pipeline. Unlike the heuristics we have seen (seeking improvement, optimism in the face of uncertainty, and random sampling), which might seem unique to optimization-related tasks, information theory is a major subfield of mathematics that has applications in a wide range of topics. As we discuss in this chapter, by appealing to information theory or, more specifically, entropy, a quantity that measures uncertainty in terms of information, we can design BayesOpt policies that seek to reduce our uncertainty about the objective function to be optimized in a principled and mathematically elegant manner.

6.1 Measuring knowledge with information theory

6.1.1 Measuring uncertainty with entropy

6.1.2 Looking for a remote control using entropy

6 Using information theory with entropy-based policies

This chapter covers

6.1 Measuring knowledge with information theory

6.1.1 Measuring uncertainty with entropy

6.1.2 Looking for a remote control using entropy

6.1.3 Binary search using entropy

6.2 Entropy search in BayesOpt

6.2.1 Searching for the optimum using information theory

6.2.2 Implementing entropy search with BoTorch

6.3 Exercises

6.3.1 Exercise 1: Incorporating prior knowledge into entropy search

6.3.2 Exercise 2: Bayesian optimization for hyperparameter tuning

Summary