about-this-book

about this book

The focus of this book is on understanding techniques for improving inference performance and costs on pretrained and customized small language models (SLMs) through optimization and quantization, serving them through diverse API ecosystems, deploying them on diverse hardware (including your own laptop), and integrating them with other paradigms such as RAG and Agentic AI. All these concepts are explained in depth and come with complete source code examples. You’ll learn to minimize the computational horsepower their models require while retaining high–quality performance times and output.

While a few examples presented in this book describe how to preprocess the data for training/test, and PEFT (parameter-efficient fine tuning) techniques are also introduced, this book doesn’t focus on training and data-preparation techniques.

Who should read this book

This book is first for ML engineers and data scientists interested in learning how to manage LLMs in the typical hardware-constrained environment that their company budget allows for. But it is also for tech leaders motivated to understand how applying custom language models on corporate data could generate extra business value.

The minimally qualified reader should have the following skills and knowledge:

about this book

Who should read this book

How this book is organized: A roadmap

About the code

liveBook discussion forum