chapter three

3 Multi-armed bandits: Evaluate multiple system changes while maximizing business metrics

 

This chapter covers

  • Defining the multi-armed bandit (MAB) problem in terms of A/B testing and system tuning
  • Modifying A/B testing’s randomization procedure to produce a solution to the MAB problem called epsilon-greedy
  • Extending epsilon-greedy to evaluate multiple system changes simultaneously
  • Evaluating system changes even more quickly with Thompson Sampling, a more efficient MAB algorithm.

In the previous chapter, we learned how to use A/B testing to evaluate changes to the system your engineering team is building. Once the tooling is in place to run A/B tests, the team should see a steady increase in the quality of the system as new changes follow the engineering workflow: implement a change candidate, evaluate it offline, evaluate it online with an A/B test.

3.1      Epsilon-greedy: Account for the impact of evaluation on business metrics

3.1.1       A/B testing as a baseline

3.1.2       The epsilon-greedy algorithm

3.1.3       Deciding when to stop

3.2      Evaluate multiple system changes simultaneously

3.3      Thompson Sampling: A more efficient MAB algorithm

3.3.1       Estimating the probability that an arm is the best

3.3.2       Randomized Probability Matching

3.3.3       The complete algorithm

3.4      Summary