Chapter 9. Modeling probabilities and nonlinearities: activation functions

 

In this chapter

What is an activation function?

Standard hidden activation functions

  • Sigmoid
  • Tanh

Standard output activation functions

  • Softmax

Activation function installation instructions

“I know that 2 and 2 make 4—& should be glad to prove it too if I could—though I must say if by any sort of process I could convert 2 & 2 into five it would give me much greater pleasure.”

George Gordon Byron, letter to Annabella Milbanke, November 10, 1813

What is an activation function?

It’s a function applied to the neurons in a layer during prediction

An activation function is a function applied to the neurons in a layer during prediction. This should seem very familiar, because you’ve been using an activation function called relu (shown here in the three-layer neural network). The relu function had the effect of turning all negative numbers to 0.

Oversimplified, an activation function is any function that can take one number and return another number. But there are an infinite number of functions in the universe, and not all them are useful as activation functions.

There are several constraints on what makes a function an activation function. Using functions outside of these constraints is usually a bad idea, as you’ll see.

Constraint 1: The function must be continuous and infinite in domain

Standard hidden-layer activation functions

Standard output layer activation functions

The core issue: Inputs have similarity

softmax computation

Activation installation instructions

Multiplying delta by the slope

Converting output to slope (derivative)

Upgrading the MNIST network