6 Building knowledge graphs with large language models

 

This chapter covers

  • Transforming an archive into a Knowledge Graph: the Rockefeller Archive Center example
  • Graph modeling
  • Normalization, cleansing, entity resolution
  • Analysis of the intellectual network

In the previous chapter, we discussed the topic of extracting complex relational knowledge from unstructured data using state-of-the-art Machine Learning technologies, including the Large Language Models. Specifically, we looked into knowledge extraction from the historical typewritten documents of the Rockefeller Archive Center (RAC). These documents contain very detailed descriptions of conversations that the representatives of the Rockefeller Foundation (RF), so called program officers, held with various researchers from a wide range of universities and other institutions. Based on the information they collected during the various meetings, they made a decision whether to recommend given research project for funding or not. In final stage, the Board of Directors of the RF approved the grants. For detailed description of the RAC use case and the goals of the project, please revisit the RAC discussion in the previous chapter.

Figure 6.1 Path from domain-specific unstructured textual data towards KG insights. Each of the key steps relies on state-of-the-art Machine Learning models, be it for example Optical Character Recognition for document digitization, NER and Relation Extraction systems, Entity Resolution and Graph ML.

6.1 Transform an archive to a Knowledge Graph

6.1.1 Graph modeling

6.1.2 Data processing and meta-graph creation

6.1.3 Normalization and cleansing

6.1.4 Graph-based entity resolution

6.2 Intellectual network analysis: the value of graphs

6.3 Next steps in the Rockefeller Archive Centre project

6.4 Knowledge Graphs value in the LLMs era

6.5 Summary