chapter three
3 Multi-armed bandits: Evaluate multiple system changes while maximizing business metrics
This chapter covers
- Defining the multi-armed bandit (MAB) problem in terms of A/B testing and system tuning
- Modifying A/B testing’s randomization procedure to produce a solution to the MAB problem called epsilon-greedy
- Extending epsilon-greedy to evaluate multiple system changes simultaneously
- Evaluating system changes even more quickly with Thompson Sampling, a more efficient MAB algorithm.
In the previous chapter, we learned how to use A/B testing to evaluate changes to the system your engineering team is building. Once the tooling is in place to run A/B tests, the team should see a steady increase in the quality of the system as new changes follow the engineering workflow: implement a change candidate, evaluate it offline, evaluate it online with an A/B test.