chapter eleven

11 Learning through representation: LeCun, Bengio, Hinton, and the mathematics of neural networks

This chapter covers

Yann LeCun, Yoshua Bengio, and Geoffrey Hinton’s Deep learning (2015), which cemented neural networks as the dominant AI framework
The evolution from biologically inspired neural models to modern deep learning systems
How deep learning integrates foundational ideas into a unified learning framework
Why representation learning replaced handcrafted features and fixed kernels
How depth allows neural networks to learn hierarchical representations

Neural networks are not a recent invention. Their origins stretch back to the middle of the twentieth century, when researchers first asked whether simple mathematical units, wired together, might reproduce aspects of biological intelligence. Early models were elegant and ambitious, but their limitations quickly became apparent. What followed was a long period of skepticism—punctuated by brief resurgences—during which neural networks were often dismissed as unstable, opaque, or mathematically unserious. By the late 1990s, many of the most influential advances in machine learning came from elsewhere: margin-based classifiers, probabilistic models, and carefully engineered features grounded in statistical theory.

11.1 The technical evolution that made deep learning possible

11.1.1 Early origins and initial optimism (1940s–1960s)

11.1.2 The first collapse: limits of shallow learning (1969–1980s)

11.1.3 Backpropagation and the second wave (1980s–1990s)

11.1.4 The deep learning revival and consolidation (2006–2015)

11.1.5 Synopsis of Deep learning (2015)

11.2 Key terms and concepts

11.2.1 Architecture and representation

11.2.2 Learning and optimization

11.2.3 Probabilistic interpretation

11.2.4 Additional core concepts

11.2.5 Why these terms and concepts matter

11.3 A neural network, end to end

11.3.1 From input to prediction: the forward pass

11.3.2 Measuring error: the loss function

11.3.3 From error to responsibility: backpropagation

11.3.4 Updating parameters: optimization

11.3.5 The full learning loop

11.4 A worked example: pedestrian detection in autonomous driving

11.4.1 Framing the classification problem

11.4.2 The neural network architecture

11.5 Framing the problem

11.5.1 Inputs, outputs, and decisions

11.5.2 Why this is a classification problem

11.5.3 Why linear rules are insufficient

11.6 The algebra of a neural network

11.6.1 Linear combination and bias terms

11.6.2 Why nonlinearity is essential