Chapter 3. Searching

 

In this chapter

  • Understanding search theory and the basics of the vector space model
  • Setting up Apache Solr
  • Making content searchable
  • Creating queries for Solr
  • Understanding search performance

Search, as a feature of an application or as an end application in itself, needs little introduction. It’s part of the fabric of our lives, whether we’re searching for information on the internet or our desktop, finding friends on Facebook, or finding a keyword in a piece of text. For the developer, search is often a key feature of most applications, but especially so in data-driven applications where the user needs to sift through large volumes of text. Moreover, search often comes in prepackaged solutions like Apple Spotlight on the desktop or via an appliance like the Google Search Appliance.

Given the ubiquity of search and the availability of prepacked solutions, a natural question to ask is, why build your own search tool using an open source solution? There are at least a few good reasons:

  • Flexibility You can control most, if not all, aspects of the process.
  • Cost of development Even when buying a commercial solution, you need to integrate it, which requires a lot of work.
  • Who knows your content better than you? Most shrink-wrap solutions make assumptions about your content that may not be appropriate.
  • Price No license fees. ’Nuff said.

3.1. Search and faceting example: Amazon.com

3.2. Introduction to search concepts

3.3. Introducing the Apache Solr search server

3.4. Indexing content with Apache Solr

3.5. Searching content with Apache Solr

3.6. Understanding search performance factors

3.7. Improving search performance

3.8. Search alternatives

3.9. Summary

3.10. Resources