10 Learning from pairwise comparisons with preference optimization
This chapter covers
- The problem of learning about and optimizing preference using only pairwise comparison data
- Training a Gaussian process on pairwise comparisons
- Optimization policies for pairwise comparisons
Have you ever found it difficult to rate something (food, product, or some experience) on an exact scale? Asking for the customer’s numerical score for a product is a common task in A/B testing and product recommendation workflows.
A/B testing
The term "A/B testing" refers to the method of measuring user’s experience in two environments (referred to as A and B) via randomized experiments and determining which environment is more desirable. A/B testing is commonly conducted by technology companies.
However, A/B testers and product recommendation engineers often have to deal with a high level of noise in the feedback collected from their customers. By "noise", we mean any type of corruption that the feedback collected from customers is subject to. Example sources of noise in product rating are the amount of advertisements served on an online streaming service, the quality of the delivery service for a package, or the general mood of the customer when they consume a product. These factors affect how the customer rates their product, potentially corrupting the signal that is the customer’s true evaluation of the product.