This chapter covers
- Information extraction pipeline
- Coreference resolution
- Named entity recognition and linking
- Relation extraction
- Developing an information extraction pipeline
The amount of text-based information available on the internet is astounding. It is hard to imagine the amount of social media posts, blogs, and news articles published daily. However, despite the wealth of information available, much of it remains unstructured and difficult to extract valuable insights from. This is where natural language processing (NLP) comes into play. NLP is a rapidly growing field that has seen a significant increase in attention in recent years, especially since transformer models [Vaswani, 2017] were introduced and more recently with GPT-3 [Brown et al., 2020] and GPT-4 models [OpenAI, 2023]. One particularly important area of NLP is the field of information extraction, which focuses on the task of extracting structured information from unstructured text.