Chapter 8. Learning signal and ignoring noise: introduction to regularization and batching
“With four parameters I can fit an elephant, and with five I can make him wiggle his trunk.”
John von Neumann, mathematician, physicist, computer scientist, and polymath
In last several chapters, you’ve learned that neural networks model correlation. The hidden layers (the middle one in the three-layer network) can even create intermediate correlation to help solve for a task (seemingly out of midair). How do you know the network is creating good correlation?
When we discussed stochastic gradient descent with multiple inputs, we ran an experiment where we froze one weight and then asked the network to continue training. As it was training, the dots found the bottom of the bowls, as it were. You saw the weights become adjusted to minimize the error.
When we froze the weight, the frozen weight still found the bottom of the bowl. For some reason, the bowl moved so that the frozen weight value became optimal. Furthermore, if we unfroze the weight to do some more training, it wouldn’t learn. Why? Well, the error had already fallen to 0. As far as the network was concerned, there was nothing more to learn.