appendix D Using larger LLMs
This book uses the 0.6-billion-parameter (0.6B) Qwen3 base model because it is the smallest model in the Qwen3 family and therefore the easiest to run on consumer hardware. But the same Qwen3Model implementation from appendix C is not limited to the 0.6B checkpoint. We can also use it to load larger Qwen3 checkpoints with the same PyTorch code we built from scratch.
In practice, this means that once we understand how to work with the 0.6B model, moving to a larger model mainly involves three changes:
- Selecting the matching configuration dictionary
- Downloading the larger checkpoint from Hugging Face
- Loading the appropriate tokenizer for the base or reasoning variant
This appendix illustrates this process using the Qwen3 4B model as an example because it is meaningfully stronger than the 0.6B model while still being easier to handle than the larger 8B, 14B, and 32B variants.
D.1 Larger dense Qwen3 configurations
The book’s repository includes configuration dictionaries for several larger Qwen3 models in the reasoning_from_scratch.appendix_c Python library, listed in table D.1. You can also view the source code directly at https://github.com/rasbt/reasoning-from-scratch/blob/main/reasoning_from_scratch/appendix_c.py.
Table D.1 Qwen3 configurations (larger than 0.6B)
| Model size |
Configuration Python dictionary |
| 1.7B |
QWEN3_CONFIG_1_7B |
| 4B |
QWEN3_CONFIG_4B |
| 8B |
QWEN3_CONFIG_8B |
| 14B |
QWEN3_CONFIG_14B |
| 32B |
QWEN3_CONFIG_32B |