Chapter 5. Indexing: where, how, what, and when
This chapter covers
- Choosing and configuring directory providers
- Choosing the appropriate analyzer
- Understanding Hibernate Search transparent indexing
- Using manual indexing
Indexing is the action of preparing data so Lucene can answer your full-text queries in an efficient way. The index should be as close as possible to your real data changes and not lag behind. Why does Lucene need to prepare data? In order to answer full-text queries efficiently, Lucene needs to store some efficient representation of the data. Since most full-text search queries revolve around the idea of words, the index is organized per word. For each word, the index structure stores the list of documents and fields matching a given word as well as some statistical information. Section 1.3.1 gave us an idea of the index structure kept by Lucene.
Lucene’s job is to build this magic structure and enable its superpowers, right? True, but it needs a little help from you:
- You need to store the index structure.
- You need to decide which of the features you require and which data preparation Lucene will do for you.
- You need to ask Lucene to index your information.
The index structure in Lucene must be stored somewhere. The two main storage solutions are in a file directory and in memory. We’ll cover how to ask Hibernate Search to use each of these strategies.