chapter six

6 A scientific approach for validating LLM-based solutions

 

This chapter covers

  • Encapsulates the LLM API to not tightly couple with the specific provider
  • Solving a mistake of the models’ silent accuracy drop with a Scientific approach
  • Trade-off between context size and model accuracy
  • Trade-off between context size and the cost of using the model

Let’s say we are given the task of implementing text-to-SQL functionality. Users will input human text, such as “generate a SQL for counting all the employees in our marketing department,” and the service will output the corresponding SQL query that can be submitted to any SQL database. We aim to harness the power of generative AI to enhance query generation. However, while doing it, there are a couple of trade-offs and mistakes that we need to avoid. Firstly, we want to avoid being tightly coupled with a specific LLM API provider. Such a mistake can result in significant maintenance overhead in the future if we choose to change the model provider company. Such a decision may be influenced by several factors, including lower costs from different model providers, improved accuracy, and a larger input context supported. We will implement a service that hides the model provider and provides a clean API for end-users, encapsulating the OpenAI API.

6.1 Creating a Text-to-SQL service

6.1.1 Creating the service skeleton

6.1.2 Defining the API of the service

6.2 Integration with the OpenAI API

6.2.1 Testing the integration

6.3 Developing the accuracy verification framework

6.3.1 Delving into the bird-bench dataset

6.3.2 Methods for comparing the expected vs the actual generated SQL and their trade-offs

6.3.3 Submitting the queries to the SQL-generator-service

6.4 Trade-offs with input-context size

6.4.1 Accuracy vs context-size

6.4.2 Context-size vs cost of running the model

6.5 Summary