Chapter 3. Debugging your first relevance problem

 

This chapter covers

  • Basic extracting, indexing, and searching content in Elasticsearch
  • Troubleshooting searches that don’t return expected results
  • Debugging the construction of the inverted index
  • Troubleshooting relevance bugs
  • Solving your first relevance issue

The previous chapter laid out a rather ideal blueprint for Lucene-based search. In this chapter, the search engine has broken down! You’ll see what it takes to debug a real, live search engine. What tools are available to gain visibility into the behavior of search engine internals? Why do certain documents match the query, whereas other more relevant documents don’t? Why do seemingly irrelevant documents outrank relevant ones?

This chapter introduces you to a beginner’s problem. Although the solutions are straightforward, in order to solve them you’ll need to master relevance debugging. You’ll use these techniques to solve every relevance problem you face. Just as in math, showing your work can be the most important step.

You’ll begin to use our search engine, Elasticsearch, to search over a real data set. As you encounter the common beginner’s problem, your focus will be on debugging two primary internal layers key to relevance: matching and ranking. Armed with renewed insights from the debugging capabilities of the search engine, you can begin to use the search engine to rank and match based on features that you know best describe your content.

3.1. Applications to Solr and Elasticsearch: examples in Elasticsearch

3.2. Our most prominent data set: TMDB

3.3. Examples programmed in Python

3.4. Your first search application

3.5. Debugging query matching

3.6. Debugging ranking

3.7. Solved? Our work is never over!

3.8. Summary

sitemap