2 Evaluating generated responses
This chapter covers
- Getting started with Spring AI evaluators
- Checking for relevancy
- Judging response correctness
- Applying evaluators at runtime
Writing tests against your code is an important practice. Not only can automated tests ensure that nothing is broken in your application, but they can also provide feedback that informs the design and implementation. Tests against the generative AI components in an application are no less important than tests for other parts of the application.
There’s only one problem. If you send the same prompt to an LLM multiple times, you’re likely to get a different answer each time. The non-deterministic nature of generative AI means that there can be no "assert equals" approach to testing.
In chapter 1, you saw how to use WireMock to mock the API’s responses to get a deterministic response in a test. That approach to testing is a good way to test the code around a request to a Generative AI API, but it doesn’t test the prompt and how the model responds to it.
Fortunately, Spring AI provides another way to decide if the generated response is an acceptable response: Evaluators.