12 Creating End-to-end LLM applications
This chapter covers
- When a standalone SLM/LLM doesn’t solve a problem.
- Building a RAG system that uses only SLMs.
- Real Agentic AI with SLMs.
Across the previous eleven chapters of this book, as we have focused on diverse strategies to optimize small models to execute with available hardware resources while keeping acceptable quality performance. And we have always considered the paradigm where a SLM receives a prompt, generates some specific content (according to the task it has been trained for), and finally returns the result to the client application/user. This is a frequent scenario, but not the only one possible with language models. Typically, they generate more business value when they are part of a more complex system. This chapter explores two of the most popular LLM-based system paradigms, RAG and Agentic AI. The novelty here is the fact that we are going to build on top of domain-specific small models only, with all the benefits coming from this approach in terms of privacy, safety, security, specificity and financial savings.