chapter five

5 Extracting domain-specific knowledge from unstructured data

This chapter covers

Building knowledge graphs from unstructured data
Complexities of managing archives: the Rockefeller Archive Center example
Using large language models to extract entities and relationships

Until now, we have discussed knowledge graphs (KGs) based on structured data such as tables, knowledge bases, and so forth, but what about unstructured data? Think about emails, chats, laws, research papers, news articles, social media, and more—the world is overflowing with information and knowledge in an unstructured form. Using these data sources could provide valuable information for your business.

The task of transforming unstructured data into knowledge consists of data ingestion and processing, various natural language processing (NLP) techniques, data enrichment, machine learning (ML) processing, and data modeling to build downstream applications. Conceptually, this process has two main challenges:

5.1 The archives challenge

5.2 Key concepts of knowledge extraction

5 Extracting domain-specific knowledge from unstructured data

This chapter covers

5.1 The archives challenge

5.2 Key concepts of knowledge extraction

5.2.1 Recognizing named entities

5.2.2 Extracting relations

5.3 Building KGs with large language models

5.3.1 Using LLMs

5.3.2 Prompt engineering examples

5.3.3 Prompt engineering guidelines

5.3.4 KG building: Traditional NLP or LLMs?

Summary