Chapter 5. Analyzing your data
This chapter covers
- Analyzing your document’s text with Elasticsearch
- Using the analysis API
- Tokenization
- Character filters
- Token filters
- Stemming
- Analyzers included with Elasticsearch
So far we’ve covered indexing and searching your data, but what actually happens when you send data to Elasticsearch? What happens to the text sent in a document to Elasticsearch? How can Elasticsearch find specific words within sentences, even when the case changes? For example, when a user searches for “nosql,” generally you’d like a document containing the sentence “share your experience with NoSql & big data technologies” to match, because it contains the word NoSql. You can use the information you learned in the previous chapter to do a query_string search for “nosql” and find the document. In this chapter you’ll learn why using the query string query will return the document. Once you finish this chapter you’ll have a better idea how Elasticsearch’s analysis allows you to search your document set in a more flexible manner.
Analysis is the process Elasticsearch performs on the body of a document before the document is sent off to be added to the inverted index. Elasticsearch goes through a number of steps for every analyzed field before the document is added to the index: