Appendix A. Design Patterns for Engineers
Sutskever’s List suggests that the history of modern AI is, above all, a history of learnable structure made scalable. The winning systems were not the ones with the most beautiful theory, but the ones that learned representations end-to-end, preserved signal through depth, matched architecture to data structure, and turned extra compute into reliable empirical gains. From AlexNet to ResNet to Transformers, the pattern is the same. Simple, modular ideas, well-engineered, scaled hard, and judged by measurable gains over the best existing baseline. The deeper claim underneath that engineering tradition is that intelligence emerges when a model compresses the world well enough to recover its hidden structure, and the deeper warning is that once such systems scale, objective design and safety can no longer be treated as afterthoughts.
A.1 Design Patterns for Engineers
Learned Representations (Chapters 2 - 7): AlexNet mattered because it displaced the SIFT/HOG era. Deep Speech 2 repeated the same move in speech by replacing phonetic dictionaries and rigid alignments with direct audio-to-text feature learning. The preference for end-to-end learned features, which repeats throughout the book, stems from the fact that the world is messy and handcrafted features are brittle.