5 Contextual bandits: Make targeted decisions

 

This chapter covers

  • Predicting the business metric outcome of a decision
  • Exploring decisions to reduce model bias
  • Exploring parameters to reduce model bias
  • Validating with an A/B test

Thus far we’ve conducted experiments that compared two or more different versions of a system: A/B testing and multi-armed bandits evaluated arbitrary changes, and RSM tuned a small number of continuous parameters. Contextual bandits, on the other hand, use experimentation to tune multiple (potentially millions of) system parameters –- but they can do so only for a narrowly-defined type of system. Specifically, the system should consist of (i) a model that predicts the short-term, business-metric outcome of a decision and (ii) a component that makes decisions based on the model’s predictions. A contextual bandit is at the heart of any personalized service you might regularly use: news, social media, advertisements, music, movies, podcasts, etc. Tuning these systems’ parameters without experimentation can lead to suboptimal results and so-called “feedback loops” (see 5.2.1).

5.1 Model a business metric offline to make decisions online

 
 
 

5.1.1 Model the business-metric outcome of a decision

 
 

5.1.2 Add the decision-making component

 
 
 

 Run and evaluate the greedy recommender

 
 

5.2 Explore actions with epsilon-greedy

 
 
 
 

5.2.1 Missing counterfactuals degrade predictions

 
 
 

5.2.2 Explore with epsilon-greedy to collect counterfactuals

 
 
 

5.3 Explore parameters with Thompson sampling

 
 

5.3.1 Create an ensemble of prediction models

 

5.3.2 Randomized probability matching

 
 
 
 

5.4 Validate the contextual bandit

 
 
 

5.5 Summary

 
 
sitemap

Unable to load book!

The book could not be loaded.

(try again in a couple of minutes)

manning.com homepage
test yourself with a liveTest