3 Multi-armed bandits: Maximizing business metrics while experimenting

 

This chapter covers

  • Defining the multi-armed bandit (MAB) problem
  • Modifying A/B testing’s randomization procedure
  • Extending epsilon-greedy to simultaneously evaluate multiple system changes
  • Evaluating system changes even more quickly with Thompson sampling

In the previous chapter, we learned how to use A/B testing to evaluate changes to the system your engineering team is building. Once the tooling is in place to run A/B tests, the team should see a steady increase in the quality of the system as new changes follow the engineering workflow: implement a change candidate, evaluate it offline, and evaluate it online with an A/B test.

As the use of A/B testing increases, you’ll spend more time evaluating “B” versions that underperform the current system, “A.” (Recall from chapter 1 that most Bs underperform As.) Every time an underperforming B is measured, you pay a cost equal to the value of the business metric you would have achieved with A minus that achieved with B. If, for example, the business metric is revenue, then the cost of measuring B is “revenue of A - revenue of B.” It will make sense to look for ways to reduce this cost. You can’t just stop an A/B test early if it looks good because, although a shorter test would yield a lower cost, stopping early would produce false positives (see chapter 8, section 8.2).

3.1 Epsilon-greedy: Account for the impact of evaluation on business metrics

3.1.1 A/B testing as a baseline

3.1.2 The epsilon-greedy algorithm

3.1.3 Deciding when to stop

3.2 Evaluating multiple system changes simultaneously

3.3 Thompson sampling: A more efficient MAB algorithm

3.3.1 Estimate the probability that an arm is the best

3.3.2 Randomized probability matching

3.3.3 The complete algorithm

Summary