chapter ten

10 Learning from pairwise comparisons with preference optimization

This chapter covers

The problem of learning about and optimizing preference using only pairwise comparison data
Training a Gaussian process on pairwise comparisons
Optimization policies for pairwise comparisons

Have you ever found it difficult to rate something (food, product, or some experience) on an exact scale? Asking for the customer’s numerical score for a product is a common task in A/B testing and product recommendation workflows.

A/B testing

The term "A/B testing" refers to the method of measuring user’s experience in two environments (referred to as A and B) via randomized experiments and determining which environment is more desirable. A/B testing is commonly conducted by technology companies.

However, A/B testers and product recommendation engineers often have to deal with a high level of noise in the feedback collected from their customers. By "noise", we mean any type of corruption that the feedback collected from customers is subject to. Example sources of noise in product rating are the amount of advertisements served on an online streaming service, the quality of the delivery service for a package, or the general mood of the customer when they consume a product. These factors affect how the customer rates their product, potentially corrupting the signal that is the customer’s true evaluation of the product.

10 Learning from pairwise comparisons with preference optimization

This chapter covers

A/B testing

10.1 Black-box optimization with pairwise comparisons

10.2 Formulating a preference optimization problem and formatting pairwise comparison data

10.3 Training a preference-based Gaussian process

10.4 Preference optimization by playing king of the hill

10.5 Summary