Chapter 2. Searching
This chapter covers:
- Searching with Lucene
- Calculating the PageRank vector
- Large-scale computing constraints
Let’s say that you have a list of documents and you’re interested in reading about those that are related to the phrase “Armageddon is near”—or perhaps something less macabre. How would you implement a solution to that problem? A brute force, and naïve, solution would be to read each document and keep only those in which you can find the term “Armageddon is near.” You could even count how many times you found each of the words in your search term within each of the documents and sort them according to that count in descending order. That exercise is called information retrieval (IR) or simply searching. Searching isn’t new functionality; nearly every application has some implementation of search, but intelligent searching goes beyond plain old searching.