chapter four

4 Information extraction

This chapter covers

Extracting information from raw text
Exploring useful NLP techniques, such as part-of-speech tagging, lemmatization, and parsing
Building a language-processing pipeline with spaCy

In the previous chapter, you looked into ways of finding texts that talk about particular concepts or facts. You’ve built an information-retrieval system that can search for texts answering particular questions. For example, if you were wondering what information science is or what methods information-retrieval systems use, you needed to provide your information-retrieval system with the queries like “What is information science?” or “What methods do information-retrieval systems use?” and the system found for you relevant texts that talk about these things.

4.1 Use cases

4.1.1 Case 1

4.1.2 Case 2

4.1.3 Case 3

4.2 Understanding the task

4.3 Detecting word types with part-of-speech tagging

4.3.1 Understanding word types

4.3.2 Part-of-speech tagging with spaCy

4.4 Understanding sentence structure with syntactic parsing

4.4.1 Why sentence structure is important

4.4.2 Dependency parsing with spaCy

4.5 Building your own information extraction algorithm

Summary

Solutions to miscellaneous exercises