chapter eight

8 Simplicity Hidden in Complexity

 

This chapter covers

  • Compression reveals secret structure
  • Complexity peaks between order and noise
  • Simplicity beats overfitting
  • Grokking: memorize, then simplify
  • Are language models just blurry JPEGs of the Internet

Sutskever has argued that any good prediction model is implicitly a good compressor, and vice versa.[1] During one talk, he cites research from 2017 on the “sentiment neuron,” where OpenAI researchers, including Ilya, trained an LSTM to predict the next character in Amazon product reviews, but discovered that a single neuron captured the review’s sentiment.[2] The sentiment neuron arises because predicting the next character forces the model to encode sentiment in a latent variable.

8.1 Coffee Automaton

8.1.1 Methodology

8.1.2 Results

8.1.3 Critical Appraisal

8.2 Kolmogorov Complexity and Algorithmic Randomness (2017)

8.2.1 Algorithmic Statistics

8.2.2 Significance

8.3 A Tutorial Introduction to the Minimum Description Length Principle

8.4 Keeping Artificial Neural Networks Simple

8.4.1 Methodology

8.4.2 Results

8.4.3 Limitations

8.4.4 Cultural Impact

8.5 Grokking

8.5.1 Compression

8.5.2 Theory to Practice to Theory

8.5.3 Double Descent

8.6 Blurry JPEG