Chapter 6. Understanding metadata
This chapter covers
Conquering your fears of extracting text from files in a few lines of Java code has hopefully put Tika on your personal must-have list. The ease and simplicity with which Tika can turn an afternoon’s parsing work into a smorgasbord of content handler plugins and event-based text processing is likely fresh on your mind. If not, head back to chapter 5 and relive the memories.
Looking ahead, sometimes before you’ve even obtained the textual content within the files you’re interested in, you may be able to weed out which files you’re not interested in, based on a few simple criteria, and save yourself a bunch of time (and processing power).
Take, for example, the use case presented in figure 6.1.