2 Nothing happens until someone writes an eval

 

This chapter covers

  • Introducing evals and Eval-Driven Development
  • Understanding why evals are essential before coding
  • Creating effective evals for your RAG system
  • Implementing automated evals in end-to-end testing
  • Using evals to build a reliable and efficient RAG chatbot

In the previous chapter, we introduced the basics of Retrieval Augmented Generation (RAG) and explored how Enterprise RAG can transform the way businesses interact with data. Now, we're ready to roll up our sleeves and start building our own RAG system. This chapter focuses on the crucial role of evals—evaluation tests that guide the development process. An eval is essentially a test case. If you're not familiar with test-driven development, here is how it works:

  1. First, you write some tests for your code.
  2. You run those tests and make sure they fail
  3. You write just enough code to make your tests pass
  4. You run your tests again and make sure they pass this time

Evals work the same way in the context of building a Retrieval Augmented Generation (RAG) chatbot. We are going to build some evals (tests) and then run them using Github Actions to make sure that they fail. Throughout the rest of the book, we will be building the code to make those tests pass. By the end of the book you will have a fully-functioning RAG system that will pass all the tests that we will build together in this chapter. Here is a quick example of an eval:

Question: “What is Product XYZ?”

2.1 Introducing evals and eval-driven development

2.2 Why you can’t write a single line of code until you have an eval

2.3 How to use evals

2.4 Automatic evals in end-to-end testing

2.4.1 How to use LLMS in the eval process

2.4.2 Testing all data sources

2.4.3 Implementing automated evals

2.5 Summary