Appendix C. Build knowledge graphs from structured sources

 

This appendix covers

  • Knowledge acquisition from structured and semi-structured data sources
  • Data reconciliation, entity merging and data cleaning
  • Post processing and knowledge graph analytics

Earlier in this book (Chapter 4), we imported and explored three already existing knowledge graphs built by other people, organizations, or universities - including Hetionet, DisGeNET, and other biomedical knowledge graphs. It was a great experience to see immediately the value they can bring and how they can be useful to answer concrete questions. Hopefully we convinced you enough since now the hardest part is going to come. It is time to try the entire process by yourself. The reason why this task is so critical is because rarely you’ll be able to find an existing knowledge graph that fulfills all your needs, and it is ready to query and build your applications on top of it. In the real world, you must build it from scratch, you need to get your hands dirty and crunch different data sources, fight to perform reconciliation among different naming conventions and identifiers, clean your graph after you polluted with millions of relationships of which you care only of 10%, when you are lucky. Welcome to the real world of working with knowledge graphs (the hard way).

C.1 Micro RNA-disease association – Warm-up

C.1.1 Key concepts

C.1.2 Business Understanding

C.1.3 Data understanding

C.2 miRNA knowledge graph building

C.2.1 Importing miRNA-Disease known connection

C.2.2 Importing disease ontology

C.2.3 Using Large Language Models for entity normalization

C.2.4 Importing miRNA information

C.3 Exploring and analyzing the miRNA Knowledge Graph

C.4 Summary

C.5 Reference