10 Learning from pairwise comparisons with preference optimization

 

This chapter covers

  • The problem of learning about and optimizing preferences using only pairwise comparison data
  • Training a GP on pairwise comparisons
  • Optimization policies for pairwise comparisons

Have you ever found it difficult to rate something (food, a product, or an experience) on an exact scale? Asking for the customer’s numerical score for a product is a common task in A/B testing and product recommendation workflows.

Definition

The term A/B testing refers to the method of measuring a user’s experience in two environments (referred to as A and B) via randomized experiments and determining which environment is more desirable. A/B testing is commonly conducted by technology companies.

A/B testers and product recommendation engineers often have to deal with a high level of noise in the feedback collected from their customers. By noise, we mean any type of corruption that the feedback collected from customers is subject to. Example sources of noise in product rating include the number of advertisements served on an online streaming service, the quality of the delivery service for a package, or the general mood of the customer when they consume a product. These factors affect how the customer rates their product, potentially corrupting the signal that is the customer’s true evaluation of the product.

10.1 Black-box optimization with pairwise comparisons

10.2 Formulating a preference optimization problem and formatting pairwise comparison data

10.3 Training a preference-based GP

10.4 Preference optimization by playing king of the hill

Summary