chapter thirteen
13 Creating end-to-end LLM applications
This chapter covers
So far, we’ve looked at strategies for optimizing small language models (SLMs) to run on available hardware while maintaining acceptable quality. Throughout, we have focused on a common pattern: an SLM receives a prompt, generates task-specific output, and returns a result to the client application or user. This is a frequent scenario, but it’s not the only way to use language models. They often deliver more business value as part of a larger system.
In this chapter, we’ll explore two popular LLM-based system paradigms: retrieval-augmented generation (RAG) and agentic AI. The difference here is that we’ll build on domain-specific small models only, which offers benefits for privacy, safety, security, specificity, and cost savings.