chapter four

4 Deep Learning Accelerates

 

This chapter covers

  • The resurgence of recurrent neural networks (RNNs) after AlexNet
  • Karpathy’s blog made RNNs accessible, inspiring experimentation
  • Chris Olah clarified LSTMs with vivid visuals and metaphors
  • Selective dropout enabled deeper recurrent networks
  • Deep Speech 2 proved the real-world potential of RNNs
  • The engineering shift in artificial intelligence

Recurrent neural networks (RNNs) are sequence models. Sequence models process ordered data such as text, speech, or time series, where the position of each element matters. n-grams are sequence models. They captured short-range dependencies by looking at a fixed window of prior tokens. RNNs extended this idea by carrying forward a compressed representation of the entire sequence history. Yet, despite their theoretical promise, practical problems such as vanishing or exploding gradients, poor handling of long-range temporal dependencies, and inefficient training limit their effectiveness. These challenges persisted until a dropout method was introduced, specifically designed for recurrent architectures.

4.1 The Unreasonable Effectiveness of Recurrent Neural Networks

4.2 Understanding LSTM Networks

4.3 Recurrent Neural Network Regularization

4.4 Deep Speech 2

4.4.1 Core Architecture

4.4.2 Training Techniques

4.4.3 Language Models and Decoding

4.4.4 Significance and Broader Impact

4.4.5 An Engineering Shift