Chapter 7. Making the right choice

 

This chapter covers

  • A/B testing
  • Complexities when making the right choice
  • Multi-armed bandits

Given a number of options, how do we choose the one that maximizes our reward (or, equivalently, minimizes the cost)? For example, if we have two possible routes to work, how might we choose the one that minimizes the time spent traveling? In this example, our reward is based on the time it takes to get to work but could equally be the cost of fuel or the time spent in traffic.

Any problem whereby an option is tested multiple times, and where each time an option is chosen a reward is returned, can be optimized using the techniques in this chapter. In this example, the route to work is decided every day, and we can record the length of the commute in a ledger. Over time, our commuter may discover patterns in the data (route A takes less time than route B) and choose this one consistently. What then is a good strategy for the commuter to take? To consistently take route A or route B? When might they have enough data to decide which route is best? What is the optimal strategy for testing these routes? These questions are the focus of this chapter. Figure 7.1 provides a graphical overview of the problem definition

7.1. A/B testing

7.2. Multi-armed bandits

7.3. Bayesian bandits in the wild

7.4. A/B vs. the Bayesian bandit

7.5. Extensions to multi-armed bandits

7.6. Summary