chapter eight

8 Simplicity Hidden in Complexity

This chapter covers

Compression reveals secret structure
Complexity peaks between order and noise
Simplicity beats overfitting
Grokking: memorize, then simplify
Are language models just blurry JPEGs of the Internet

Papers

Quantifying the Rise and Fall of Complexity in Closed Systems: the Coffee Automaton (Aaronson, Carroll, and Ouellette, 2014)
The First Law of Complexodynamics (Aaronson, 2011)
Kolmogorov Complexity and Algorithmic Randomness (Uspensky, Shen, and Vereshchagin, 2017)
A Tutorial Introduction to the Minimum Description Length Principle (Grünwald, 2004)
Keeping Neural Networks Simple by Minimizing the Description Length of the Weights (Hinton and van Camp, 1993)

Sutskever has argued that any good prediction model is implicitly a good compressor, and vice versa.[1] During one talk, he cites research from 2017 on the “sentiment neuron,” where OpenAI researchers, including Ilya, trained an LSTM to predict the next character in Amazon product reviews, but discovered that a single neuron captured the review’s sentiment.[2] The sentiment neuron arises because predicting the next character forces the model to encode sentiment in a latent variable.

8.1 Coffee Automaton

8.1.1 Methodology

8.1.2 Results

8.1.3 Critical Appraisal

8.2 Kolmogorov Complexity and Algorithmic Randomness (2017)

8.2.1 Algorithmic Statistics

8.2.2 Significance

8.3 A Tutorial Introduction to the Minimum Description Length Principle

8.4 Keeping Artificial Neural Networks Simple

8.4.1 Methodology

8.4.2 Results

8.4.3 Limitations

8.4.4 Cultural Impact

8.5 Grokking

8.5.1 Compression

8.5.2 Theory to Practice to Theory

8.5.3 Double Descent

8.6 Blurry JPEG