chapter twelve

12 Designing LLM-powered systems

This chapter covers

How LLMs extend traditional MLOps infrastructure and practices
Building a RAG system from document ingestion to response generation
Implementing prompt engineering workflows with version control and testing
Setting up observability for multi-step LLM reasoning chains

Throughout this book, we've built a comprehensive foundation for ML Engineering—from containerized deployments to monitoring pipelines. But the field continues to evolve rapidly, and Large Language Models (LLMs) represent the most significant shift in how we build AI applications since the rise of deep learning itself.

LLMs bring new opportunities and challenges that extend our traditional MLOps practices. While the fundamentals you've learned remain crucial—reliable infrastructure, systematic deployment, continuous monitoring—LLMs introduce unique operational considerations that demand evolved approaches: non-deterministic outputs that break traditional testing assumptions, complex multi-step reasoning chains that require new debugging strategies, prompt engineering as a critical discipline, and safety concerns that go beyond model accuracy.

12.1 LLMOps: New challenges, familiar principles

12.1.1 What makes LLM applications different

12.1.2 Extending our ML platform for LLMs

12.1.3 Essential tools for LLM applications

12.2 Building DataKrypt's Dakka Bot: A simple RAG architecture

12.2.1 What you'll build

12.2.2 Beyond single API calls: Designing for composability

12.2.3 Google's Gemini LLM and embeddings

12.2.4 The retrieval component

12.2.5 The augmentation component

12.2.6 The generation component

12.3 Giving DakkaBot a UI

12.4 Observability for LLM applications

12.4.1 Set up LangFuse via Docker

12.4.2 Integrating LangFuse with DakkaBot

12.4.3 Enhanced observability in DakkaBotCore

12.4.4 Beyond traditional metrics

12.5 Summary