chapter five

Chapter 5. Indexing: where, how, what, and when

This chapter covers

Choosing and configuring directory providers
Choosing the appropriate analyzer
Understanding Hibernate Search transparent indexing
Using manual indexing

Indexing is the action of preparing data so Lucene can answer your full-text queries in an efficient way. The index should be as close as possible to your real data changes and not lag behind. Why does Lucene need to prepare data? In order to answer full-text queries efficiently, Lucene needs to store some efficient representation of the data. Since most full-text search queries revolve around the idea of words, the index is organized per word. For each word, the index structure stores the list of documents and fields matching a given word as well as some statistical information. Section 1.3.1 gave us an idea of the index structure kept by Lucene.

Lucene’s job is to build this magic structure and enable its superpowers, right? True, but it needs a little help from you:

You need to store the index structure.
You need to decide which of the features you require and which data preparation Lucene will do for you.
You need to ask Lucene to index your information.

The index structure in Lucene must be stored somewhere. The two main storage solutions are in a file directory and in memory. We’ll cover how to ask Hibernate Search to use each of these strategies.

Chapter 5. Indexing: where, how, what, and when

This chapter covers

5.1. DirectoryProvider: storing the index

5.2. Analyzers: doors to flexibility

5.3. Transparent indexing

5.4. Indexing:when transparency is not enough

5.5. Summary