Appendix D. Using larger LLMs
The main chapters use the 0.6-billion-parameter (0.6B) Qwen3 base model because it is the smallest model in the Qwen3 family and therefore the easiest to run on consumer hardware.
However, the same Qwen3Model implementation from appendix C is not limited to the 0.6B checkpoint. We can also use it to load larger Qwen3 checkpoints with the same from-scratch PyTorch code. In practice, this means that once we understand how to work with the 0.6B model, moving to a larger model mainly involves three changes:
- selecting the matching configuration dictionary;
- downloading the larger checkpoint from Hugging Face;
- loading the appropriate tokenizer for the base or reasoning variant.
This appendix illustrates this process using the Qwen3 4B model as a concrete example, because it is large enough to be meaningfully stronger than the 0.6B model while still being easier to handle than the larger 8B, 14B, and 32B variants.
D.1 Larger dense Qwen3 configurations
The repository includes configuration dictionaries for several larger Qwen3 models in the reasoning_from_scratch.appendix_c Python library, which are listed in table D.1. (You can also view the source code in the supplementary materials at https://github.com/rasbt/reasoning-from-scratch/blob/main/reasoning_from_scratch/appendix_c.py).