welcome
Thank you for purchasing the MEAP for Build a DeepSeek Model (From Scratch).
The ideas for this book grew out of our YouTube series, "Vizuara's Build DeepSeek from Scratch," which launched in February 2025. The series showed a clear demand for hands-on, first-principles material, encouraging us to create this more structured and detailed written guide.
To get the most from this book, a foundation in machine learning and deep learning concepts is required. You should be comfortable with Python, be familiar with the basic operations in a framework like PyTorch, and have some exposure to the transformer architecture, even if you haven't implemented one yourself.
This book is a hands-on guide to the technical innovations that make the DeepSeek model family work. We chose DeepSeek because it marked a significant moment in open-source AI, demonstrating that an open model could achieve performance comparable to leading proprietary systems.
Our approach is to build the model's key components from scratch. The book is structured around a four-stage roadmap, covering the innovations in a logical order:
- The foundational Key-Value (KV) Cache for efficient inference.
- The core architectural components: Multi-Head Latent Attention (MLA) and Deepseek Mixture-of-Experts (MoE).
- Advanced training techniques, including Multi-Token Prediction (MTP) and FP8 quantization.
- Post-training methods like Reinforcement Learning (RL) and Knowledge Distillation.