Appendix A. References and further reading
A.1 Chapter 1
A.1.1 References
The announcement article of OpenAI's o1 model, which is regarded as the first LLM-based reasoning model:
- Introducing OpenAI o1-preview, https://openai.com/index/introducing-openai-o1-preview/
DeepSeek-R1 is the first open-source reasoning model that was accompanied by a comprehensive technical report, which was the first to show that reasoning emerges from reinforcement learning with verifiable rewards (a topic covered in more detail in chapter 5):
- DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning, https://arxiv.org/abs/2501.12948
OpenAI CEO’s comment on the reasoning ("chain-of-thought") capabilities of future models:
- "[...] We will next ship GPT-4.5, the model we called Orion internally, as our last non-chain-of-thought model. [...]", https://x.com/sama/status/1889755723078443244
A research paper by AI researchers at Apple finding that reasoning models are sophisticated (but very capable) pattern matchers:
- The Illusion of Thinking: Understanding the Strengths and Limitations of Reasoning Models via the Lens of Problem Complexity, https://machinelearning.apple.com/research/illusion-of-thinking
An in-depth book and guide on implementing and training large language models step-by-step:
- Build a Large Language Model (From Scratch), http://mng.bz/orYv
A.1.2 Further Reading
An introduction to how DeepSeek-R1 works, providing insights into the foundations of reasoning in LLMs: