Chapter 9. Modeling probabilities and nonlinearities: activation functions
What is an activation function?
Standard hidden activation functions
- Sigmoid
- Tanh
Standard output activation functions
- Softmax
Activation function installation instructions
“I know that 2 and 2 make 4—& should be glad to prove it too if I could—though I must say if by any sort of process I could convert 2 & 2 into five it would give me much greater pleasure.”
George Gordon Byron, letter to Annabella Milbanke, November 10, 1813
An activation function is a function applied to the neurons in a layer during prediction. This should seem very familiar, because you’ve been using an activation function called relu (shown here in the three-layer neural network). The relu function had the effect of turning all negative numbers to 0.
Oversimplified, an activation function is any function that can take one number and return another number. But there are an infinite number of functions in the universe, and not all them are useful as activation functions.
There are several constraints on what makes a function an activation function. Using functions outside of these constraints is usually a bad idea, as you’ll see.