chapter fifteen

15 Test-time Compute and Small Language Models

This chapter covers

The definition of test-time compute, without the hype.
Test-time compute for large and small language models through the OptiLLM inference proxy.
An overview of the latest Open Source SLMs that embed text-time compute.
How to tune a SLM for reasoning through GRPO (a Reinforcement Learning technique) on a specific domain.

This chapter introduces the concept of test-time compute and the related state-of-the-art SLMs and libraries. It also includes a complete example that explains how to apply the GRPO technique used to train the DeepSeek-R1 models to specialize a SLM on a given domain, on commodity hardware.

15 Test-time Compute and Small Language Models

This chapter covers

15.1 Test-time compute

15.2 The OptiLLM inference proxy

15.3 SLMs with embedded test-time compute

15.4 Building a reasoning domain-specific SLM

15.5 Summary