chapter ten

10 Deploying and Monitoring

 

This chapter covers

  • How LLMOps differs from traditional software operations
  • Choosing between hosted APIs and self-hosted models
  • Building hybrid deployment architectures that optimize for both cost and capability
  • Implementing LLM-native monitoring systems that track response quality, user satisfaction, and business impact
  • Designing automated quality assurance pipelines to maintain output standards at scale

At 3:04 AM, an alert arrives that no one wants to see:

“URGENT: AI chatbot billing alert – $47,000 this month. System failing.”

Just days before, the company’s new LLM-powered support assistant had been a success story in the making. It sailed through internal testing, impressed executives, and promised to reduce support costs dramatically. Now it’s producing unpredictable results, racking up massive expenses, and creating more confusion than value.

10.1 Introducing LLMOps

10.2 Serving LLMs: Hosted APIs vs. open-source models

10.2.1 Using hosted APIs

10.2.2 The open-source alternative

10.2.3 The hybrid solution: Best of both worlds

10.3 Building LLM-native monitoring systems

10.3.1 What really matters: The four questions

10.3.2 Logging what actually matters

10.3.3 Setting up alerts that actually help

10.3.4 Catching cost explosions before they hurt

10.3.5 Building dashboards that drive action

10.3.6 Output quality monitoring

10.4 User experience and feedback monitoring

10.4.1 Explicit feedback collection

10.4.2 Implicit feedback signals

10.4.3 Building actionable feedback loops

10.5 Ensuring high-quality outputs in production

10.5.1 The three-pillar quality framework

10.5.2 Prompt engineering for consistent quality

10.5.3 Continuous quality monitoring with automated testing

10.6 Observability in practice: Introducing Langfuse with a real-world case study

10.6.1 Case Study: How Huntr uses Langfuse to power the AI Resume Builder

10.7 Summary