chapter thirteen

13 Production LLM system design

 

This chapter covers

  • Implementing prompt engineering workflows with version control and testing
  • Testing strategies for non-deterministic generative systems
  • Deploying safety guardrails and governance frameworks for production
  • Adversarial testing and vulnerability assessment for LLM applications

Moving from prototype to production with LLM applications introduces challenges that traditional ML engineering doesn't adequately address. While the fundamentals of robust system design remain essential, generative AI systems demand new approaches to testing, monitoring, and safety that account for their non-deterministic nature.

Now we’ll cover some of the operational discipline required to deploy LLM applications reliably at scale. You'll learn to treat prompts as critical infrastructure requiring version control and systematic testing, implement evaluation frameworks that assess semantic quality rather than exact outputs, and deploy comprehensive safety guardrails that prevent harmful content generation and prompt injection attacks.

The shift from deterministic to probabilistic systems fundamentally changes how we approach quality assurance. Traditional assertion-based testing breaks when the same input produces multiple valid outputs. Instead, you need evaluation frameworks that can assess whether responses are factually correct, appropriately scoped, and aligned with business policies—even when the exact wording varies between runs.

13.1 Prompt engineering: Code for the GenAI era

13.1.1 Treating prompts as critical infrastructure

13.1.2 LangFuse prompt management for DakkaBot

13.1.3 LangFuse prompt management for production

13.2 Testing LLM applications

13.2.1 Evaluation framework for LLM responses

13.2.2 Safety and adversarial testing

13.3 Governance and safety in production

13.3.1 Implementing safety guardrails

13.4 Cost optimization strategies

13.4.1 Understanding LLM economics

13.4.2 Model selection strategy

13.4.3 Caching strategies

13.4.4 Prompt optimization for efficiency

13.4.5 Production cost monitoring

13.4.6 From traditional ML to LLMOps

13.5 Summary