chapter fifteen
15 Test-time Compute and Small Language Models
This chapter covers
- The definition of test-time compute, without the hype.
- Test-time compute for large and small language models through the OptiLLM inference proxy.
- An overview of the latest Open Source SLMs that embed text-time compute.
- How to tune a SLM for reasoning through GRPO (a Reinforcement Learning technique) on a specific domain.
This chapter introduces the concept of test-time compute and the related state-of-the-art SLMs and libraries. It also includes a complete example that explains how to apply the GRPO technique used to train the DeepSeek-R1 models to specialize a SLM on a given domain, on commodity hardware.