chapter eight

8 Simplicity Hidden in Complexity

 

This chapter covers

  • Compression reveals secret structure
  • Complexity peaks between order and noise
  • Simplicity beats overfitting
  • Grokking: memorize, then simplify
  • Are language models just blurry JPEGs of the Internet

Sutskever has argued that any good prediction model is implicitly a good compressor, and vice versa.[1] In one talk, he cites 2017 research on the “sentiment neuron,” in which OpenAI researchers, including Sutskever, trained an LSTM to predict the next character in Amazon product reviews but discovered that a single neuron captured the review’s sentiment.[2] The sentiment neuron arises because predicting the next character forces the model to encode sentiment in a latent variable.

8.1 Coffee Automaton

8.1.1 Methodology

8.1.2 Results

8.2 Kolmogorov Complexity and Algorithmic Randomness

8.2.1 Algorithmic Statistics

8.2.2 Significance

8.3 A Tutorial Introduction to the Minimum Description Length Principle

8.4 Keeping Artificial Neural Networks Simple

8.4.1 Methodology

8.4.2 Results

8.4.3 Limitations

8.4.4 Cultural influence

8.5 Grokking

8.5.1 Compression

8.5.2 Theory to Practice to Theory

8.5.3 Double Descent

8.6 Blurry JPEG