chapter eleven

11 Named-entity recognition

This chapter covers

Introducing named-entity recognition (NER)
Overviewing sequence labeling approaches in NLP
Integrating NER into downstream tasks
Introducing further data preprocessing tools and techniques

Previous chapters overviewed a number of NLP tasks, from binary classification tasks, such as author identification and sentiment analysis, to multiclass classification tasks, such as topic analysis. These applications deployed machine-learning models and relied on a range of linguistic features, most often related to words or word characteristics. While it is true that individual words express information useful in the context of many NLP applications, often the information-bearing unit is actually larger than a single word. In chapter 4, you looked into the task of information extraction. Remember that this task allows you to extract facts and relevant information from an otherwise unstructured data, such as raw, unprocessed text. This task is instrumental in a number of applications, from information management to database completion to question answering. For instance, suppose you have a collection of texts on various personalities, including the Wikipedia article on Albert Einstein (https://en.wikipedia.org/wiki/Albert_Einstein). Figure 11.1 shows a sentence from this article.

Figure 11.1 A sentence from the Wikipedia article on Albert Einstein with the critical information chunks (entities) highlighted

11.1 Named entity recognition: Definitions and challenges

11.1.1 Named entity types

11.1.2 Challenges in named entity recognition

11 Named-entity recognition

This chapter covers

Figure 11.1 A sentence from the Wikipedia article on Albert Einstein with the critical information chunks (entities) highlighted

11.1 Named entity recognition: Definitions and challenges

11.1.1 Named entity types

11.1.2 Challenges in named entity recognition

11.2 Named-entity recognition as a sequence labeling task

11.2.1 The basics: BIO scheme

11.2.2 What does it mean for a task to be sequential?

11.2.3 Sequential solution for NER

11.3 Practical applications of NER

11.3.1 Data loading and exploration

11.3.2 Named entity types exploration with spaCy

11.3.3 Information extraction revisited

11.3.4 Named entities visualization