chapter eleven

11 Building Learning to Rank training data from user clicks

 

This chapter covers:

  • Automating Learning to Rank (LTR) retraining from user behavioral signals (clicks, etc.)
  • Transforming user signals into implicit LTR training data using click models
  • Why raw clicks alone don’t work well to build LTR training data
  • Compensating for the user’s tendancy to click farther up the search results page, regardless of relevance
  • Handling documents with fewer clicks in the training data

In Chapter 10, we went step-by-step to train a Learning to Rank (LTR) model. Like walking through the mechanics of building a car, we saw the underlying nuts and bolts of LTR model training. In this chapter we treat the LTR training process as a black box. In other words, we step away from LTR internals, instead treating LTR more like a self-driving car, fine tuning its trip toward a final destination.

Recall that LTR relies on accurate training data in order to be effective. LTR training data describes how users expect search results to be optimally ranked. The training data provides the directions we input into our LTR self-driving car. As you’ll see, knowing what’s relevant based on user interactions comes with many challenges. If we can overcome these challenges and gain high confidence in our training data, though, then we can build Automated Learning to Rank: a system that regularly retrains LTR to capture the latest user relevance expectations.

11.1 (Re)creating judgment lists from signals

11.1.1 Generating implicit, probabilistic judgments from signals

11.1.2 Training an LTR model using probabilistic judgments

11.1.3 Click-through Rate: Your First Click Model

11.1.4 Common biases in judgments

11.2 Overcoming Position Bias: The Search Engine Returned it higher, it must be better!

11.2.1 Defining Position Bias

11.2.2 Position bias in RetroTech data

11.2.3 A Click Model that Overcomes Position Bias: Simplified Dynamic Bayesian Network

11.3 Handling Confidence Bias: not upending your model from a few lucky clicks

11.3.1 The Low Confidence Problem in RetroTech Click Data

11.3.2 Using a Beta Prior to Model Confidence Probabilistically

11.4 Exploring your training data in an LTR System

11.5 Summary