concept dropout layer in category deep learning

This is an excerpt from Manning's book Deep Learning for Vision Systems MEAP V08 livebook.
Add dropout layers to avoid overfitting
Let’s see how we use Keras to add a dropout layer to our previous model:
![]()
As you can see, the dropout layer takes the rate as an argument. It represents the fraction of the input units to drop. For example, if we set the rate to 0.3, it means that 30% of the neurons in this layer will be randomly dropped in each epoch. So if we have 10 nodes in a layer, 3 of these neurons will be turned off and 7 will be trained. The 3 neurons are randomly selected and in the next epoch another randomly-selected neurons are turned off, and so on. Since we do this randomly, some neurons may get turned off more than other. And some others may never get turned off. This is ok because we do this many times that in average each neuron will get almost the same treatment. Note that this rate is another hyperparameter that we will tune when building our CNN.
def discriminator_model(): # instantiate a sequential model and name it discriminator discriminator = Sequential() # add a convolutional layer to the discriminator model discriminator.add(Conv2D(32, kernel_size=3, strides=2, input_shape=(28,28,1), padding="same")) # add a leakyRelu activation function discriminator.add(LeakyReLU(alpha=0.2)) # add a dropout layer with a 25% dropout probability discriminator.add(Dropout(0.25)) # add a second convolutional layer with zero padding discriminator.add(Conv2D(64, kernel_size=3, strides=2, padding="same")) discriminator.add(ZeroPadding2D(padding=((0,1),(0,1)))) # add a BatchNormalization layer for faster learning and higher accuracy discriminator.add(BatchNormalization(momentum=0.8)) discriminator.add(LeakyReLU(alpha=0.2)) discriminator.add(Dropout(0.25)) # add a third convolutional layer with batch norm, leakyRelu, and a dropout discriminator.add(Conv2D(128, kernel_size=3, strides=2, padding="same")) discriminator.add(BatchNormalization(momentum=0.8)) discriminator.add(LeakyReLU(alpha=0.2)) discriminator.add(Dropout(0.25)) # add the fourth convolutional layer with batch norm, leakyRelu, and a dropout discriminator.add(Conv2D(256, kernel_size=3, strides=1, padding="same")) discriminator.add(BatchNormalization(momentum=0.8)) discriminator.add(LeakyReLU(alpha=0.2)) discriminator.add(Dropout(0.25)) # flatten the network and add the output Dense layer with sigmoid activation function discriminator.add(Flatten()) discriminator.add(Dense(1, activation='sigmoid')) # print the model summary discriminator.summary() # set the input image shape img = Input(shape=img_shape) # run the discriminator model to get the output probability probability = discriminator(img) # return a Model that takes the image as an input and produces the probability output return Model(img, probability)

This is an excerpt from Manning's book Deep Learning with JavaScript: Neural networks in TensorFlow.js.
The first three differences in this list give the node-based model a higher capacity than the browser-based model. They are also what make the node-based model too memory- and computation-intensive to be trained with acceptable speed in the browser. As we learned in chapter 3, with greater model capacity comes a greater risk of overfitting. The increased risk of overfitting is ameliorated by the fourth difference, namely, the inclusion of dropout layers.
During the training phase (during Model.fit() calls), it randomly sets a fraction of the elements in the input tensor as zero (or “dropped”), and the result is the output tensor of the dropout layer. For the purpose of this example, a dropout layer has only one configuration parameter: the dropout rate (for example, the two rate fields as shown in listing 4.5). For example, suppose a dropout layer is configured to have a dropout rate of 0.25, and the input tensor is a 1D tensor of value [0.7, -0.3, 0.8, -0.4]; the output tensor may be [0.7, -0.3, 0.0, 0.4]—with 25% of the input tensor’s elements selected at random and set to the value 0. During backpropagation, the gradient tensor on a dropout layer is affected similarly by this random zeroing-out. During the inference phase (during Model.predict() and Model.evaluate() calls), a dropout layer does not randomly zero-out elements in the input tensor. Instead, the input is simply passed through as the output without change (that is, an identity mapping). Figure 4.11 shows an example of how a dropout layer with a 2D input tensor works at training time and testing time.
Figure 4.11. An example of how a dropout layer works. In this example, the input tensor is 2D and has a shape of [4, 2]. The dropout layer has its rate configured as 0.25, which leads to 25% (that is, two out of eight) elements of the input tensor being randomly selected and set to zero during the training phase. During the inference phase, the layer acts as a trivial passthrough.
![]()