chapter four

4 Generating Cypher queries from natural language questions

This chapter covers

The basics of query language generation
Where query language generation fits in the RAG pipeline
Useful practices for query language generation
Implementing a text2cypher retriever using a base model
Specialized (finetuned) LLMs for text2cypher

We’ve covered a lot of ground in the previous chapters. We’ve learned how to build a knowledge graph, extract information from text, and use that information to answer questions. We’ve also looked into how we can extend and improve plain vector search retrieval by using hardcoded Cypher queries to get more relevant context to the LLM. In this chapter, we will go a step further and learn how to generate Cypher queries from natural language questions. This will allow us to build a more flexible and dynamic retrieval system that can adapt to different types of questions and knowledge graphs.

Note In the implementation of this chapter, we use what we call the “Movies dataset.” See the appendix for more information on the dataset and various ways to load it.

4 Generating Cypher queries from natural language questions

This chapter covers

4.1 The basics of query language generation

4.2 Where query language generation fits in the RAG pipeline

4.3 Useful practices for query language generation

4.3.1 Using few-shot examples for in-context learning

4.3.2 Using database schema in the prompt to show the LLM the structure of the knowledge graph

4.3.3 Adding terminology mapping to semantically map the user question to the schema

4.3.4 Format instructions

4.4 Implementing a text2cypher generator using a base model

4.5 Specialized (finetuned) LLMs for text2cypher

4.6 What we’ve learned and what text2cypher enables

Summary