12 Construct a graph using NLP techniques

This chapter covers

Information extraction pipeline
Coreference resolution
Named entity recognition and linking
Relation extraction
Developing an information extraction pipeline

The amount of text-based information available on the internet is astounding. It is hard to imagine the amount of social media posts, blogs, and news articles published daily. However, despite the wealth of information available, much of it remains unstructured and difficult to extract valuable insights from. This is where natural language processing (NLP) comes into play. NLP is a rapidly growing field that has seen a significant increase in attention in recent years, especially since transformer models [Vaswani, 2017] were introduced and more recently with GPT-3 [Brown et al., 2020] and GPT-4 models [OpenAI, 2023]. One particularly important area of NLP is the field of information extraction, which focuses on the task of extracting structured information from unstructured text.

Figure 12.1. Extract structured information from text and use it to construct a graph.

12.1 Coreference resolution

12.2 Named entity recognition

12.2.1 Entity linking

12.3 Relation extraction

12.4 Implementation of information extraction pipeline

12.4.1 SpaCy

12.4.2 Corefence resolution

12.4.3 End-to-end relation extraction

12.4.4 Entity linking

12.4.5 External data enrichment

12.5 Summary

12.6 References

12.7 Solutions to exercises