chapter eleven

11 Information extraction and knowledge graphs

This chapter covers

Extracting named entities from text
Understanding the structure of sentences using dependency parsing
Converting a dependency tree into knowledge
Building a knowledge graph from text

In chapter 10, you learned how to use large transformers to generate words that sound smart. But language models on their own are just faking it by predicting the next word that will sound reasonable to you. Your AI can’t reason about the real world until you give it access to facts and knowledge about the world. In chapter 2, you learned how to do exactly this, but you didn’t know it then. You were able to tag tokens with their part of speech and their logical role in the meaning of a sentence (dependency tree). This old-fashioned token-tagging algorithm is all you need to give your generative language models (AI) knowledge about the real world. The goal of this chapter is to teach your bot to understand what it reads. And you’ll put that understanding into a flexible data structure designed to store knowledge, known as a knowledge graph. Then, your bot can use that knowledge to make decisions and say smart stuff about the world.

11.1 Grounding

11.1.1 Going old-fashioned: Information extraction with patterns

11.2 First things first: Segmenting your text into sentences

11.2.1 Why won’t split('.!?') work?

11.2.2 Sentence segmentation with regular expressions

11.2.3 Sentence semantics

11.3 A knowledge extraction pipeline

11.4 Entity recognition

11.4.1 Pattern-based entity recognition: Extracting GPS locations

11.4.2 Named entity recognition with spaCy

11.5 Coreference resolution

11.5.1 Coreference resolution with spaCy

11.5.2 Entity name normalization

11.6 Dependency parsing

11.6.1 Constituency parsing with benepar

11.7 From dependency parsing to relation extraction

11.7.1 Pattern-based relation extraction

11.7.2 Neural relation extraction

11.8 Building your knowledge base

11.8.1 A large knowledge graph

11.9 Finding answers in a knowledge graph