3 ResNet Revolution
This chapter covers
- The challenges of training deep neural networks
- Residual connections and how they revolutionized deep learning
- Evolution to ResNet v2 and training networks exceeding 1,000 layers
- Pooling and striding were replaced with dilated convolutions for dense prediction tasks.
- CS231n becomes the first deep learning course at Stanford
Before 2015, increasing depth did not immediately translate into improved performance. In fact, deeper models often exhibited higher training error. The heart of the problem was a double bind. Depth was required for representational capacity, yet it impaired optimization. Reducing depth eased optimization but capped capacity. Just as despair seemed justified, a deceptively simple idea emerged.
Residual connections ease this tension by providing shortcuts that preserve the forward signal and maintain effective gradient pathways during backpropagation. Operationalized in the Residual Networks (ResNets), they earn their place on Sutskever’s List because they represent a pivotal shift in design philosophy in which “add a residual connection” has become an indispensable tool in the machine learning toolkit.[1]