appendix-d

appendix D Using larger LLMs

This book uses the 0.6-billion-parameter (0.6B) Qwen3 base model because it is the smallest model in the Qwen3 family and therefore the easiest to run on consumer hardware. But the same Qwen3Model implementation from appendix C is not limited to the 0.6B checkpoint. We can also use it to load larger Qwen3 checkpoints with the same PyTorch code we built from scratch.

In practice, this means that once we understand how to work with the 0.6B model, moving to a larger model mainly involves three changes:

Selecting the matching configuration dictionary
Downloading the larger checkpoint from Hugging Face
Loading the appropriate tokenizer for the base or reasoning variant

This appendix illustrates this process using the Qwen3 4B model as an example because it is meaningfully stronger than the 0.6B model while still being easier to handle than the larger 8B, 14B, and 32B variants.

D.1 Larger dense Qwen3 configurations

The book’s repository includes configuration dictionaries for several larger Qwen3 models in the reasoning_from_scratch.appendix_c Python library, listed in table D.1. You can also view the source code directly at https://github.com/rasbt/reasoning-from-scratch/blob/main/reasoning_from_scratch/appendix_c.py.

Table D.1 Qwen3 configurations (larger than 0.6B)

Model size	Configuration Python dictionary
1.7B	QWEN3_CONFIG_1_7B
4B	QWEN3_CONFIG_4B
8B	QWEN3_CONFIG_8B
14B	QWEN3_CONFIG_14B
32B	QWEN3_CONFIG_32B

appendix D Using larger LLMs

D.1 Larger dense Qwen3 configurations

Table D.1 Qwen3 configurations (larger than 0.6B)

D.2 Downloading larger checkpoints overview

D.3 Loading a larger base model

D.4 Loading a larger reasoning variant

D.5 Practical recommendations