8 Signals boosting models

 

This chapter covers

  • Aggregating user signals to create a popularity-based ranking model
  • Normalizing signals to best enhance relevance for noisy query input
  • Fighting signal spam and user manipulation of crowdsourced signals
  • Applying time decays to prioritize recent signals as more relevant
  • Blending multiple signal types together into a unified signals boosting model
  • Balancing flexibility and performance using query time vs. index-time signals boosting.

In Chapter 4, we covered three different categories of reflected intelligence: Signals Boosting (popularized relevance), Collaborative Filtering (personalized relevance), and Learning to Rank (generalized relevance). In this chapter, we’ll dive deeper into the first of these, implementing Signals Boosting to enhance the relevance ranking of your most popular queries and documents.

In most search engines, you will find that a relatively small number of queries tend to make up a large portion of your total query volume. These popular queries, called head queries, also tend to lead to more signals (such as clicks and purchases in an e-commerce use case), which enable stronger inferences about the popularity of top search results.

Signals boosting models directly harness these stronger inferences and are the key to ensuring your most important and highest-visibility queries are best tuned to return the most relevant documents.

8.1 Basic signals boosting

8.2 Normalizing Signals

8.3 Fighting Signal Spam

8.3.1 Using signal spam to manipulate search results

8.3.2 Combatting signal spam through user-based filtering

8.4 Combining multiple signal types

8.5 Time decays and short-lived signals

8.5.1 Handling time-sensitive documents

8.5.2 Handling time-sensitive signals

8.6 Index-time vs. Query-time boosting: balancing scale vs. flexibility

8.6.1 Tradeoffs when using query-time boosting

8.6.2 Implementing Index-time signals boosting

8.7 Summary

sitemap