appendix-a

Appendix A. References and further reading

 

A.1 Chapter 1: Understanding reasoning models

A.1.1 References

The announcement article of OpenAI's o1 model, which is regarded as the first LLM-based reasoning model:

DeepSeek-R1 is the first open-source reasoning model that was accompanied by a comprehensive technical report, which was the first to show that reasoning emerges from reinforcement learning with verifiable rewards (a topic covered in more detail in chapter 5):

OpenAI CEO’s comment on the reasoning ("chain-of-thought") capabilities of future models:

A research paper by AI researchers at Apple finding that reasoning models are sophisticated (but very capable) pattern matchers:

An in-depth book and guide on implementing and training large language models step-by-step:

A.1.2 Further Reading

A.2 Chapter 2: Generating text with a pre-trained LLM

A.2.1 References

A.2.2 Further Reading

A.3 Chapter 3: Evaluating reasoning models

A.3.1 References

A.3.2 Further Reading

A.4 Chapter 4: Improving reasoning with inference-time scaling

A.4.1 References

A.4.2 Further Reading

A.5 Chapter 5: Inference-time scaling via self-refinement

A.5.1 References

A.5.2 Further Reading

A.6 Chapter 6: Training reasoning models with reinforcement learning

A.6.1 References

A.6.2 Further Reading

A.7 Chapter 7: Improving GRPO for reinforcement learning

A.7.1 References

A.7.2 Further Reading

A.8 Chapter 8: Distilling Reasoning Models for Efficient Reasoning

A.8.1 References

A.8.2 Further Reading

A.9 Appendix F: Common approaches to LLM evaluation

A.9.1 References