chapter thirteen

13 Creating end-to-end LLM applications

This chapter covers

Why standalone SLMs/LLMs aren’t always enough
Building a RAG system that uses only SLMs
Building an agentic AI with SLMs

So far, we’ve looked at strategies for optimizing small language models (SLMs) to run on available hardware while maintaining acceptable quality. Throughout, we have focused on a common pattern: an SLM receives a prompt, generates task-specific output, and returns a result to the client application or user. This is a frequent scenario, but it’s not the only way to use language models. They often deliver more business value as part of a larger system.

In this chapter, we’ll explore two popular LLM-based system paradigms: retrieval-augmented generation (RAG) and agentic AI. The difference here is that we’ll build on domain-specific small models only, which offers benefits for privacy, safety, security, specificity, and cost savings.

13 Creating end-to-end LLM applications

This chapter covers

13.1 Why LLMs alone aren’t enough

13.2 Combining a domain-specific SLM with RAG

13.3 Using a vector database

13.4 Building an agent

Summary