chapter four

4 Text2Cypher: Generating query language from natural language questions

 

This chapter covers

  • The basics of query language generation
  • Where query language generation fits in the RAG pipeline
  • Useful practices for query language generation
  • Implementing a text2cypher retriever using a base model
  • Specialized (fine-tuned) LLMs for text2cypher

We’ve covered a lot of ground in the previous chapters. We’ve learned how to build a knowledge graph, how to extract information from text, and how to use that information to answer questions. We’ve also looked into how we can extend and improve plain vector search retrieval by using hard coded Cypher queries to get more relevant context to the LLM. In this chapter we will take a step further and learn how to generate Cypher queries from natural language questions. This will allow us to build a more flexible and dynamic retrieval system that can adapt to different types of questions and knowledge graphs.

Movies dataset

In the implementation of this chapter we use what we call the "Movies dataset". See the appendix for more information on the dataset and various ways to load it.

4.1 The basics of query language generation

4.2 Where query language generation fits in the RAG pipeline

4.3 Useful practices for query language generation

4.3.1 Using Few-shot examples for in-context learning

4.3.2 Using database schema in the prompt to show the LLM the structure of the knowledge graph

4.3.3 Adding terminology mapping to semantically map the user question to the schema

4.3.4 Format instructions

4.4 Implementing a text2cypher generator using a base model

4.5 Specialized (fine-tuned) LLMs for text2cypher

4.6 What we’ve learned and what text2cypher enables

4.7 Summary