5 Contextual bandits: Make targeted decisions
This chapter covers
- Predicting the business metric outcome of a decision
- Exploring decisions to reduce model bias
- Exploring parameters to reduce model bias
- Validating with an A/B test
Thus far we’ve conducted experiments that compared two or more different versions of a system: A/B testing and multi-armed bandits evaluated arbitrary changes, and RSM tuned a small number of continuous parameters. Contextual bandits, on the other hand, use experimentation to tune multiple (potentially millions of) system parameters –- but they can do so only for a narrowly-defined type of system. Specifically, the system should consist of (i) a model that predicts the short-term, business-metric outcome of a decision and (ii) a component that makes decisions based on the model’s predictions. A contextual bandit is at the heart of any personalized service you might regularly use: news, social media, advertisements, music, movies, podcasts, etc. Tuning these systems’ parameters without experimentation can lead to suboptimal results and so-called “feedback loops” (see 5.2.1).