Chapter 6. Text analysis


This chapter covers

  • Testing with Solr’s Analysis form
  • Defining custom field types for advanced text analysis
  • Extending text analysis with Solr’s Plug-In framework

In chapter 5, we learned how the Solr indexing process works and learned to define nontext fields in schema.xml. In this chapter, we get deeper into the indexing process by learning about text analysis.

Text analysis removes the linguistic variations between terms in the index and terms provided by users when searching, so that a user’s query for buying a new house can match a document titled purchasing a new home. In this chapter you’ll learn how to configure Solr to establish a match between queries containing house and documents containing home.

When done correctly, text analysis allows your users to query using natural language without having to think about all the possible forms of their search terms. You don’t want your users to have to construct queries like: buying house OR purchase home OR buying a home OR purchasing a house ...

Allowing users to find information they seek using natural language is fundamental to providing a good user experience. Given the broad adoption and sophistication of Google and similar search engines, users are conditioned to expect search engines to be very intelligent, and intelligence in search starts with great text analysis!

6.1. Analyzing microblog text

6.2. Basic text analysis

6.3. Defining a custom field type for microblog text

6.4. Advanced text analysis

6.5. Summary