chapter thirteen

13 Creating end-to-end LLM applications

 

This chapter covers

  • Why standalone SLMs/LLMs aren’t always enough
  • Building a RAG system that uses only SLMs
  • Building an agentic AI with SLMs

So far, we’ve looked at strategies for optimizing small language models (SLMs) to run on available hardware while maintaining acceptable quality. Throughout, we have focused on a common pattern: an SLM receives a prompt, generates task-­specific output, and returns a result to the client application or user. This is a frequent scenario, but it’s not the only way to use language models. They often deliver more business value as part of a larger system.

In this chapter, we’ll explore two popular LLM-based system paradigms: retrieval-­augmented generation (RAG) and agentic AI. The difference here is that we’ll build on domain-specific small models only, which offers benefits for privacy, safety, security, specificity, and cost savings.

13.1 Why LLMs alone aren’t enough

13.2 Combining a domain-specific SLM with RAG

13.3 Using a vector database

13.4 Building an agent

Summary