Chapter 10. Tika and the Lucene search stack

 

This chapter covers

We’re going to take a break from our in-depth tour of the Tika framework. By now, those topics should be second nature to you. But you may not be so comfortable with phrases like Mahout, or Droids, or (eep!) Open Relevance.

Though these terms might sound foreign, they’re common terminology to those familiar with the Apache Lucene[1] family of search-related applications. Lucene is an Apache Top Level Project, or TLP, originally home to a number of search-related software products that themselves have grown to TLP-level status, including Tika.

1 The name Lucene was Doug Cutting’s wife’s middle name, and her maternal grandmother’s first name as detailed at http://mng.bz/XyTG.

It’s our job in this chapter to educate you about these projects, and frame your understanding of Tika’s usefulness and relationship to this family of software applications. We’ll keep it high-level, focusing more on the architecture and less on the actual implementations. Those are dutifully covered in other fine Manning books.[2]

2 Specifically, we encourage you to check out Mahout in Action, Lucene in Action (1st and 2nd editions), and Solr in Action, because they cover Tika in some form and will help as a supplement to this book.

10.1. Load-bearing walls

 
 
 

10.2. The steel frame

 
 

10.3. The finishing touches

 
 
 
 

10.4. Summary

 
 
sitemap

Unable to load book!

The book could not be loaded.

(try again in a couple of minutes)

manning.com homepage
test yourself with a liveTest