Copyright
Brief Table of Contents
Table of Contents
Foreword
Preface
Acknowledgments
About this Book
About the Cover Illustration
Chapter 1. Getting started taming text
1.1. Why taming text is important
1.2. Preview: A fact-based question answering system
1.2.1. Hello, Dr. Frankenstein
1.3. Understanding text is hard
1.4. Text, tamed
1.5. Text and the intelligent app: search and beyond
1.5.1. Searching and matching
1.5.2. Extracting information
1.5.3. Grouping information
1.5.4. An intelligent application
1.6. Summary
1.7. Resources
Chapter 2. Foundations of taming text
2.1. Foundations of language
2.1.1. Words and their categories
2.1.2. Phrases and clauses
2.1.3. Morphology
2.2. Common tools for text processing
2.2.1. String manipulation tools
2.2.2. Tokens and tokenization
2.2.3. Part of speech assignment
2.2.4. Stemming
2.2.5. Sentence detection
2.2.6. Parsing and grammar
2.2.7. Sequence modeling
2.3. Preprocessing and extracting content from common file formats
2.3.1. The importance of preprocessing
2.3.2. Extracting content using Apache Tika
2.4. Summary
2.5. Resources
Chapter 3. Searching
3.1. Search and faceting example: Amazon.com
3.2. Introduction to search concepts
3.2.1. Indexing content
3.2.2. User input
3.2.3. Ranking documents with the vector space model
3.2.4. Results display
3.3. Introducing the Apache Solr search server
3.3.1. Running Solr for the first time