8 Signals-boosting models

This chapter covers

Aggregating user signals to create popularity-based ranking model
Normalizing signals for noisy query input
Fighting signal spam in crowdsourced signals
Applying time decays to prioritize recent signals
Blending multiple signal types together into one model
Choosing query-time versus index-time boosting

In chapter 4, we covered three different categories of reflected intelligence: signals boosting (popularized relevance), collaborative filtering (personalized relevance), and learning to rank (generalized relevance). In this chapter, we’ll dive deeper into the first of these, implementing signals boosting to enhance the relevance ranking of your most popular queries and documents.

In most search engines, a relatively small number of queries tend to make up a large portion of the total query volume. These popular queries, called head queries, also tend to lead to more signals (such as clicks and purchases in an e-commerce use case), which enable stronger inferences about the popularity of top search results.

Signals-boosting models directly harness these stronger inferences and are the key to ensuring your most important and highest-visibility queries are best tuned to return the most relevant documents.

8.1 Basic signals boosting

8.2 Normalizing signals

8.3 Fighting signal spam

8.3.1 Using signal spam to manipulate search results

8.3.2 Combating signal spam through user-based filtering

8.4 Combining multiple signal types

8.5 Time decays and short-lived signals

8.5.1 Handling time-insensitive signals

8.5.2 Handling time-sensitive signals

8.6 Index-time vs. query-time boosting: Balancing scale vs. flexibility

8.6.1 Tradeoffs when using query-time boosting

8.6.2 Implementing index-time signals boosting

8.6.3 Tradeoffs when implementing index-time boosting

Summary