chapter two

2 Evaluating generated responses

This chapter covers

Getting started with Spring AI evaluators
Checking for relevancy
Judging response correctness
Applying evaluators at runtime

Writing tests against your code is an important practice. Not only can automated tests ensure that nothing is broken in your application, but they can also provide feedback that informs the design and implementation. Tests against the generative AI components in an application are no less important than tests for other parts of the application.

There’s only one problem. If you send the same prompt to an LLM multiple times, you’re likely to get a different answer each time. The nondeterministic nature of generative AI means that there can be no “assert equals” approach to testing.

In chapter 1, you saw how to use WireMock to mock the API’s responses to get a deterministic response in a test. That approach to testing is a good way to test the code around a request to a generative AI API, but it doesn’t test the prompt and how the model responds to it. Fortunately, Spring AI provides another way to decide if the generated response is an acceptable response: Evaluators.

2 Evaluating generated responses

This chapter covers

2.1 Ensuring relevant answers

2.2 Testing for factual accuracy

2.3 Applying self-evaluation at runtime

Summary