12 Overcoming bias in learned relevance models

This chapter covers

Using live users to get feedback on our LTR model
A/B testing search relevance solutions with live users
Exploring possible relevant results beyond the top results we always show users
Balancing exploiting what we’ve learned from historical data and exploring what might be relevant

So far our Learning to Rank work has happened in the lab. In previous chapters, we built models using automatically constructed training data from user clicks. In this chapter, we’ll take our model into the real world for a test drive with (simulated) live users!

Recall that we compared the full Automated Learning to Rank system to a self-driving car. Internally, the car has an engine: the end-to-end model retraining on historical judgements as discussed in Chapter 10. In Chapter 11 we compared our model’s training data to self driving car directions: what should we optimize to automatically learn judgements based on previous interactions with search results? We built training data, and overcame key biases inherent in click data.

12.1 Our Automated LTR engine in a few lines of code

12.1.1 Turning clicks into training data (Chapter 11 in one line of code)

12.1.2 Model training & evaluation in a few function calls

12.2 A/B testing a new model

12.2.1 Taking a better model out for a test drive

12.2.2 Defining an A/B test in the context of automated LTR

12.2.3 Graduating the better model into an A/B test

12.2.4 When 'good' models go bad: what we can learn about a failed A/B test?

12.3 Overcoming Presentation Bias: Knowing When to Explore vs Exploit

12.3.1 Presentation bias in RetroTech training data

12.3.2 Beyond the ad-hoc: thoughtfully exploring with a Gaussian Process

12.3.3 Training and Analyzing a Gaussian Process

12.3.4 Examining the outcome of our explorations

12.4 Explore, exploit, gather, rinse, repeat: the full Automated LTR loop