chapter seven
7 Building robust agents with evaluation and feedback
This chapter covers
- Introducing agent evaluation and feedback
- Implementing test-driven agent development
- Employing grounding, critic, and evaluation agents
- Using Phoenix for evaluation and feedback
Building robust, reliable, safe and debuggable agentic systems is all about implementing evaluation and feedback. Agent evaluation comes in many forms, from benchmark testing, red team testing, grounding, and even agents that evaluate agents. Likewise, feedback systems developed for agents may come from human experience, agent evaluators or critics, testing output, and self-assessment.
While it is generally a requirement to implement evaluation and feedback into any production agent system, this shouldn’t be the only time you look at hardening your agent systems. You almost always want to roll in this final layer (Layer 5 - Evaluation and Feedback), which we will explore in this chapter.