chapter six

6 A scientific approach for validating LLM-based solutions

 

This chapter covers

  • Encapsulation of the LLM API to not tightly couple with the specific provider
  • Solving a mistake of the models’ silent accuracy drop with a Scientific approach
  • Trade-off between context size and model accuracy
  • Trade-off between context size and the cost of using the model

Let’s say we are given the task of implementing text-to-SQL functionality. Users will input human text, such as “generate a SQL for counting all the employees in our marketing department,” and the service will output the corresponding SQL query that can be submitted to any SQL database.

We aim to harness the power of generative AI to enhance query generation. However, while doing it, there are a couple of trade-offs and mistakes that we need to avoid.

6.1 Creating a text-to-SQL service

6.1.1 Creating the service skeleton

6.1.2 Defining the API of the service

6.2 Integration with the OpenAI API

6.2.1 Testing the integration

6.3 Developing the accuracy verification framework

6.3.1 Delving into the bird-bench dataset

6.3.2 Methods for comparing expected vs the actual generated SQL and their trade-offs

6.3.3 Submitting the queries to the SQL-generator-service

6.4 Trade-offs with input-context size

6.4.1 Accuracy vs context-size

6.4.2 Context-size vs cost of running the model

6.4.3 Migration to another AI provider - Google Gemini

6.4.4 Deploying the application to Google Cloud Run

6.4.5 Verifying the text-to-SQL running in the cloud

6.4.6 Switch the AI provider

6.5 Summary