appendix-a

Appendix A. References and further reading

A.1 Chapter 1

A.1.1 References

The announcement article of OpenAI's o1 model, which is regarded as the first LLM-based reasoning model:

Introducing OpenAI o1-preview, https://openai.com/index/introducing-openai-o1-preview/

DeepSeek-R1 is the first open-source reasoning model that was accompanied by a comprehensive technical report, which was the first to show that reasoning emerges from reinforcement learning with verifiable rewards (a topic covered in more detail in chapter 5):

DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning, https://arxiv.org/abs/2501.12948

OpenAI CEO’s comment on the reasoning ("chain-of-thought") capabilities of future models:

"[...] We will next ship GPT-4.5, the model we called Orion internally, as our last non-chain-of-thought model. [...]", https://x.com/sama/status/1889755723078443244

A research paper by AI researchers at Apple finding that reasoning models are sophisticated (but very capable) pattern matchers:

The Illusion of Thinking: Understanding the Strengths and Limitations of Reasoning Models via the Lens of Problem Complexity, https://machinelearning.apple.com/research/illusion-of-thinking

An in-depth book and guide on implementing and training large language models step-by-step:

Build a Large Language Model (From Scratch), http://mng.bz/orYv

A.1.2 Further Reading

An introduction to how DeepSeek-R1 works, providing insights into the foundations of reasoning in LLMs:

Appendix A. References and further reading

A.1 Chapter 1

A.1.1 References

A.1.2 Further Reading

A.2 Chapter 2

A.2.1 References

A.2.2 Further Reading

A.3 Chapter 3

A.3.1 References

A.3.2 Further Reading

A.4 Chapter 4

A.4.1 References

A.4.2 Further Reading

A.5 Chapter F

A.5.1 References

A.5.2 Further Reading