Chapter 12. Searching and indexing

 

This chapter covers

  • Searching databases
  • Indexing content using Ferret and Solr
  • Searching with other technologies
  • The scraping technique

Throughout this book, and throughout your entire programming life, you’ve been used to dealing with data, because it forms the input and output for all computer programs. So far, we have mostly looked at transforming data from one state to another, but in this chapter we’re going to investigate Ruby’s abilities to let you search through data.

Unfortunately, as a language that has reached maturity only in the last few years, Ruby is not blessed with hundreds of search-related libraries. This is not necessarily a bad thing, as search technologies progress quickly, and most of the available Ruby search solutions are up to date and ready to use in production immediately.

In this chapter, we’ll look at Ruby-specific techniques for searching and indexing data, and we’ll examine some solutions to common search-related problems.

We’re going to look at standalone libraries and techniques available to Ruby developers, and we’ll walk through the process of indexing content using two Apache Lucene-based libraries, Ferret and Solr, as well as a performance-driven Ruby-only library called FTSearch. We’ll also look at integrating search features with other technologies, and at searching the web, searching databases, and adding indexing and search features to Ruby on Rails applications.

12.1. The principles of searching

12.2. Standalone and high-performance searching

12.3. Integrating search with other technologies

12.4. Summary