chapter twelve

12 Constructing a graph using natural language processing techniques

This chapter covers

The information extraction pipeline
Coreference resolution
Named entity recognition and linking
Relation extraction
Developing an information extraction pipeline

The amount of text-based information available on the internet is astounding. It is hard to imagine the number of social media posts, blogs, and news articles published daily. However, despite the wealth of information available, much of it remains unstructured and difficult to extract valuable insights from. This is where natural language processing (NLP) comes into play. NLP is a rapidly growing field that has seen a significant increase in attention in recent years, especially since transformer models (Vaswani, 2017) and, more recently, the GPT-3 (Brown et al., 2020) and GPT-4 models (OpenAI, 2023) were introduced. One particularly important area of NLP is the field of information extraction, which focuses on the task of extracting structured information from unstructured text.

12.1 Coreference resolution

12.2 Named entity recognition

12.2.1 Entity linking

12.3 Relation extraction

12.4 Implementation of information extraction pipeline

12.4.1 SpaCy

12.4.2 Corefence resolution

12.4.3 End-to-end relation extraction

12.4.4 Entity linking

12.4.5 External data enrichment

12.5 Solutions to exercises

Summary