5 Domain-specific knowledge extraction from unstructured data

 

This chapter covers

  • Building Knowledge Graphs from unstructured data
  • Complexities of managing the archives: the Rockefeller Archive Center example
  • Large Language Models for fast & accurate entity and relation extraction

Until now, we have discussed knowledge graphs (KGs) based on structured data such as tables, knowledge bases, and so forth, but what about unstructured data? Think of the various documentations, emails, chats, laws, research papers, guidelines, news articles, social media, and so on. The world is overflowing with information and knowledge locked in the unstructured form. Using these data sources could result in obtaining countless valuable observations, facts and insights important for your business.

The task of transforming unstructured data into knowledge is a complex multistep process, which consists of data ingestion and processing, various Natural Language Processing (NLP) techniques, data enrichment, ML processing and data modeling to build various downstream applications. Conceptually, this process has two main challenges:

  • Knowledge representation
  • Knowledge learning and construction

5.1 The archives challenge

5.2 Key Concepts of Knowledge Extraction

5.2.1 Named Entity Recognition

5.2.2 Relation extraction

5.3 Building KGs with Large Language Models

5.3.1 How to use Large Language Models

5.3.2 Prompt engineering in examples

5.3.3 Prompt engineering guidelines

5.3.4 KG building: Traditional NLP or LLMs?

5.4 Summary

5.5 References