Chapter 2. Building a search index

This chapter covers

Performing basic index operations
Boosting documents and fields during indexing
Indexing dates, numbers, and sortable fields
Advanced indexing topics

So you want to search files stored on your hard disk, or perhaps search your email, web pages, or even data stored in a database. Lucene can help you do that. But before you can search something, you’ll have to index it, and Lucene will help you do that as well, as you’ll learn in this chapter.

In chapter 1, you saw a simple indexing example. This chapter goes further and teaches you about index updates, parameters you can use to tune the indexing process, and more advanced indexing techniques that will help you get the most out of Lucene. Here you’ll also find information about the structure of a Lucene index, important issues to keep in mind when accessing a Lucene index with multiple threads and processes, the transactional semantics of Lucene’s indexing API, sharing an index over remote file systems, and the locking mechanism that Lucene employs to prevent concurrent index modification.

2.1. How Lucene models content

2.2. Understanding the indexing process

2.3. Basic index operations

2.4. Field options

2.5. Boosting documents and fields

2.6. Indexing numbers, dates, and times

2.7. Field truncation

2.8. Near-real-time search

2.9. Optimizing an index

2.10. Other directory implementations

2.11. Concurrency, thread safety, and locking issues

2.12. Debugging indexing

2.13. Advanced indexing concepts

2.14. Summary