Thus far we’ve conducted experiments that compared two or more different versions of a system: A/B testing and multi-armed bandits evaluated arbitrary changes, and RSM optimized a small number of continuous parameters. Contextual bandits, in contrast, use experimentation to optimize multiple (potentially millions of) system parameters—but they can do so only for a narrowly defined type of system. Specifically, the system should consist of (1) a model that predicts the short-term, business-metric outcome of a decision and (2) a component that makes decisions based on the model’s predictions. A contextual bandit is at the heart of any personalized service you might regularly use: news, social media, advertisements, music, movies, podcasts, and so on. Tuning these systems’ parameters without experimentation can lead to suboptimal results and “feedback loops” (see section 5.2.1).