To my lovely wife Lisa and my son Christian
CM
To my lovely wife Kirsi-Marja and our happy cats
JZ
Copyright
Brief Table of Contents
Table of Contents
Foreword
Preface
Acknowledgments
About this Book
About the Authors
About the Cover Illustration
1. Getting started
Chapter 1. The case for the digital Babel fish
Chapter 2. Getting started with Tika
Chapter 3. The information landscape
2. Tika in detail
Chapter 4. Document type detection
Chapter 5. Content extraction
Chapter 6. Understanding metadata
Chapter 7. Language detection
Chapter 8. What’s in a file?
3. Integration and advanced use
Chapter 9. The big picture
Chapter 10. Tika and the Lucene search stack
Chapter 11. Extending Tika
4. Case studies
Chapter 12. Powering NASA science data systems
Chapter 13. Content management with Apache Jackrabbit
Chapter 14. Curating cancer research data with Tika
Chapter 15. The classic search engine example
Appendix A. Tika quick reference
Appendix B. Supported metadata keys
Index
List of Figures
List of Tables
List of Listings
1.1.1. A taxonomy of file formats