Chapter 13. Content management with Apache Jackrabbit

This chapter covers

The Apache Jackrabbit Content Repository
The use of Tika in Jackrabbit
File detection and parsing for Jackrabbit WebDAV

Apache Jackrabbit, http://jackrabbit.apache.org, is a content repository that provides a rich storage layer on which to build content and document management systems like the ones we discussed earlier in chapter 9. Full-text search and WebDAV integration are two key features of a content repository. In this case study we’ll learn how Jackrabbit uses Tika to help implement these features.

We’ll start by briefly describing the key features of Apache Jackrabbit and the Content Repository for Java technology (JCR) API (http://www.jcp.org/en/jsr/detail?id=170) that it implements. Armed with this background, we’ll then look deeper into how Jackrabbit’s search feature uses a pool of Tika threads to achieve the illusion of being able to index arbitrarily large documents nearly in real time. We’ll also look at how Tika’s type detection feature is used to add smarts to Jackrabbit’s WebDAV integration layer. We’ll end this case study with a brief summary.

13.1. Introducing Apache Jackrabbit

Apache Jackrabbit is an implementation of a new special kind of a database called a content repository. Defined in Java Specification Requests (JSRs) 170 and 283, a content repository is a hierarchically organized storage engine that combines features from advanced file systems and relational databases.

Chapter 13. Content management with Apache Jackrabbit

This chapter covers

13.1. Introducing Apache Jackrabbit

13.2. The text extraction pool

13.3. Content-aware WebDAV

13.4. Summary

Chapter 13. Content management with Apache Jackrabbit

This chapter covers

13.1. Introducing Apache Jackrabbit

13.2. The text extraction pool

13.3. Content-aware WebDAV

13.4. Summary

Unable to load book!