12 Designing LLM-powered systems
This chapter covers
- How LLMs extend traditional MLOps infrastructure and practices
- Building a RAG system from document ingestion to response generation
- Implementing prompt engineering workflows with version control and testing
- Setting up observability for multi-step LLM reasoning chains
Throughout this book, we've built a comprehensive foundation for ML Engineering—from containerized deployments to monitoring pipelines. But the field continues to evolve rapidly, and Large Language Models (LLMs) represent the most significant shift in how we build AI applications since the rise of deep learning itself.
LLMs bring new opportunities and challenges that extend our traditional MLOps practices. While the fundamentals you've learned remain crucial—reliable infrastructure, systematic deployment, continuous monitoring—LLMs introduce unique operational considerations that demand evolved approaches: non-deterministic outputs that break traditional testing assumptions, complex multi-step reasoning chains that require new debugging strategies, prompt engineering as a critical discipline, and safety concerns that go beyond model accuracy.