Chapter 2. Searching
Listing 2.1. Reading, indexing, and searching the default list of web pages
Figure 2.1. An example of retrieving, parsing, analyzing, indexing, and searching a set of web pages with a few lines of code
Listing 2.2. The LuceneIndexBuilder creates a Lucene index
Listing 2.3. MySearcher: retrieving search results based on Lucene indexing
Listing 2.4. Reading, indexing, and searching web pages that contain spam
Figure 2.4. A single deceptive web page significantly altered the ranking of the results for the query “Armstrong.”
Listing 2.5. Calculating the PageRank vector
Figure 2.6. The calculation of the PageRank vector for the small network of the business news web pages
Listing 2.6. Evaluating the matrix H based on the links between web pages
Listing 2.7. Applying the power method for the calculation of PageRank
Listing 2.8. Evaluation of the error between two consecutive PageRank vectors
Listing 2.9. Combining the Lucene and PageRank scores for ranking web pages
Figure 2.7. Combining the Lucene scores and the PageRank scores allows you to eliminate spam.
Listing 2.10. Combining the Lucene scores and the PageRank scores
Listing 2.11. Accounting for user clicks in the search results
Listing 2.12. Evaluating the relevance of a URL with the NaiveBayes classifier
Listing 2.13. Lucene indexing, PageRank values, and user click probabilities
Figure 2.8. Combining Lucene, PageRank, and user clicks to produce high-relevance search results for dmitry.