chapter three

3 Multi-armed bandits: Evaluate multiple system changes while maximizing business metrics

This chapter covers

Defining the multi-armed bandit (MAB) problem in terms of A/B testing and system tuning
Modifying A/B testing’s randomization procedure to produce a solution to the MAB problem called epsilon-greedy
Extending epsilon-greedy to evaluate multiple system changes simultaneously
Evaluating system changes even more quickly with Thompson Sampling, a more efficient MAB algorithm.

In the previous chapter, we learned how to use A/B testing to evaluate changes to the system your engineering team is building. Once the tooling is in place to run A/B tests, the team should see a steady increase in the quality of the system as new changes follow the engineering workflow: implement a change candidate, evaluate it offline, evaluate it online with an A/B test.

3.1 Epsilon-greedy: Account for the impact of evaluation on business metrics

3.1.1 A/B testing as a baseline

3.1.2 The epsilon-greedy algorithm

3.1.3 Deciding when to stop

3.2 Evaluate multiple system changes simultaneously

3.3 Thompson Sampling: A more efficient MAB algorithm

3.3.1 Estimating the probability that an arm is the best

3.3.2 Randomized Probability Matching

3.3.3 The complete algorithm

3.4 Summary