table-of-contents

Table of Contents

Brief Table of Contents

Table of Contents

Acknowledgments

About this Book

About the Cover Illustration

Chapter 1. Getting started taming text

1.1. Why taming text is important

1.2. Preview: A fact-based question answering system

1.2.1. Hello, Dr. Frankenstein

1.3. Understanding text is hard

1.4. Text, tamed

1.5. Text and the intelligent app: search and beyond

1.5.1. Searching and matching

1.5.2. Extracting information

1.5.3. Grouping information

1.5.4. An intelligent application

Chapter 2. Foundations of taming text

2.1. Foundations of language

2.1.1. Words and their categories

2.1.2. Phrases and clauses

2.1.3. Morphology

2.2. Common tools for text processing

2.2.1. String manipulation tools

2.2.2. Tokens and tokenization

2.2.3. Part of speech assignment

2.2.4. Stemming

2.2.5. Sentence detection

2.2.6. Parsing and grammar

2.2.7. Sequence modeling

2.3. Preprocessing and extracting content from common file formats

2.3.1. The importance of preprocessing

2.3.2. Extracting content using Apache Tika

Chapter 3. Searching

3.1. Search and faceting example: Amazon.com

3.2. Introduction to search concepts

3.2.1. Indexing content

3.2.2. User input

3.2.3. Ranking documents with the vector space model

3.2.4. Results display

3.3. Introducing the Apache Solr search server

3.3.1. Running Solr for the first time

@font-face { font-family: 'livebook'; src:url('https://d19npu3b8zepp3.cloudfront.net/assets/fonts/livebook.eot?1.9.0'); src:url('https://d19npu3b8zepp3.cloudfront.net/assets/fonts/livebook.eot?1.9.0') format('embedded-opentype'), url('https://d19npu3b8zepp3.cloudfront.net/assets/fonts/livebook.woff?1.9.0') format('woff'), url('https://d19npu3b8zepp3.cloudfront.net/assets/fonts/livebook.ttf?1.9.0') format('truetype'), url('https://d19npu3b8zepp3.cloudfront.net/assets/fonts/livebook.svg?1.9.0') format('svg'); font-weight: normal; font-style: normal; }