Dedication

To my lovely wife Lisa and my son Christian

CM

To my lovely wife Kirsi-Marja and our happy cats

JZ

Brief Table of Contents

 

Copyright

Brief Table of Contents

Table of Contents

Foreword

Preface

Acknowledgments

About this Book

About the Authors

About the Cover Illustration

1. Getting started

Chapter 1. The case for the digital Babel fish

Chapter 2. Getting started with Tika

Chapter 3. The information landscape

2. Tika in detail

Chapter 4. Document type detection

Chapter 5. Content extraction

Chapter 6. Understanding metadata

Chapter 7. Language detection

Chapter 8. What’s in a file?

3. Integration and advanced use

Chapter 9. The big picture

Chapter 10. Tika and the Lucene search stack

Chapter 11. Extending Tika

4. Case studies

Chapter 12. Powering NASA science data systems

Chapter 13. Content management with Apache Jackrabbit

Chapter 14. Curating cancer research data with Tika

Chapter 15. The classic search engine example

Appendix A. Tika quick reference

Appendix B. Supported metadata keys

Index

List of Figures

List of Tables

List of Listings

Table of Contents

Copyright

Brief Table of Contents

Table of Contents

Foreword

Preface

Acknowledgments

About this Book

About the Authors

About the Cover Illustration

1. Getting started

Chapter 1. The case for the digital Babel fish

1.1.1. A taxonomy of file formats