4 Information Extraction

This chapter covers:

How to extract information from raw text
A number of useful NLP concepts, including part-of-speech tagging, lemmatization, and dependency parsing
How to build a language processing pipeline with spaCy, an industrial-strength Natural Language Processing library

In the previous chapter you looked into ways of finding texts that talk about particular concepts or facts. You’ve built an information retrieval system that can search for texts answering particular questions. For example, if you were wondering what information science is or what methods information retrieval systems use, you needed to provide your information retrieval system with the queries like “What is information science?” or “What methods do information retrieval systems use?”, and the system found for you relevant texts that talk about these things.

4.1 Use cases

4.2 Understanding the task

4.3 Detecting word types with part-of-speech tagging

4.3.1 Understanding word types

4.3.2 Part-of-speech tagging with spaCy

4.4 Understanding sentence structure with syntactic parsing

4.4.1 Why sentence structure is important

4.4.2 Dependency parsing with spaCy

4.5 Building your own Information Extraction algorithm

4.6 Summary