concept dropout layer in category deep learning

appears as: dropout layers, dropout layer, dropout layer, The dropout layer
Deep Learning for Vision Systems MEAP V08 livebook

This is an excerpt from Manning's book Deep Learning for Vision Systems MEAP V08 livebook.

  • Add dropout layers to avoid overfitting
  • Let’s see how we use Keras to add a dropout layer to our previous model:

    As you can see, the dropout layer takes the rate as an argument. It represents the fraction of the input units to drop. For example, if we set the rate to 0.3, it means that 30% of the neurons in this layer will be randomly dropped in each epoch. So if we have 10 nodes in a layer, 3 of these neurons will be turned off and 7 will be trained. The 3 neurons are randomly selected and in the next epoch another randomly-selected neurons are turned off, and so on. Since we do this randomly, some neurons may get turned off more than other. And some others may never get turned off. This is ok because we do this many times that in average each neuron will get almost the same treatment. Note that this rate is another hyperparameter that we will tune when building our CNN.

    def discriminator_model():
           # instantiate a sequential model and name it discriminator
    discriminator = Sequential()
    # add a convolutional layer to the discriminator model
    discriminator.add(Conv2D(32, kernel_size=3, strides=2, input_shape=(28,28,1),   padding="same"))
    # add a leakyRelu activation function
           discriminator.add(LeakyReLU(alpha=0.2))
    # add a dropout layer with a 25% dropout probability
           discriminator.add(Dropout(0.25))
    # add a second convolutional layer with zero padding
           discriminator.add(Conv2D(64, kernel_size=3, strides=2, padding="same"))
           discriminator.add(ZeroPadding2D(padding=((0,1),(0,1))))
    # add a BatchNormalization layer for faster learning and higher accuracy
           discriminator.add(BatchNormalization(momentum=0.8))
           discriminator.add(LeakyReLU(alpha=0.2))
           discriminator.add(Dropout(0.25))
     
    # add a third convolutional layer with batch norm, leakyRelu, and a dropout
           discriminator.add(Conv2D(128, kernel_size=3, strides=2, padding="same"))
           discriminator.add(BatchNormalization(momentum=0.8))
           discriminator.add(LeakyReLU(alpha=0.2))
           discriminator.add(Dropout(0.25))
     
    # add the fourth convolutional layer with batch norm, leakyRelu, and a dropout
           discriminator.add(Conv2D(256, kernel_size=3, strides=1, padding="same"))
           discriminator.add(BatchNormalization(momentum=0.8))
           discriminator.add(LeakyReLU(alpha=0.2))
           discriminator.add(Dropout(0.25))
     
    # flatten the network and add the output Dense layer with sigmoid activation function
           discriminator.add(Flatten())
           discriminator.add(Dense(1, activation='sigmoid'))
     
    # print the model summary
           discriminator.summary()
     
    # set the input image shape
           img = Input(shape=img_shape)
    # run the discriminator model to get the output probability
           probability = discriminator(img)
     
    # return a Model that takes the image as an input and produces the probability output
           return Model(img, probability)
    Deep Learning with JavaScript: Neural networks in TensorFlow.js

    This is an excerpt from Manning's book Deep Learning with JavaScript: Neural networks in TensorFlow.js.

    The first three differences in this list give the node-based model a higher capacity than the browser-based model. They are also what make the node-based model too memory- and computation-intensive to be trained with acceptable speed in the browser. As we learned in chapter 3, with greater model capacity comes a greater risk of overfitting. The increased risk of overfitting is ameliorated by the fourth difference, namely, the inclusion of dropout layers.

  • During the training phase (during Model.fit() calls), it randomly sets a fraction of the elements in the input tensor as zero (or “dropped”), and the result is the output tensor of the dropout layer. For the purpose of this example, a dropout layer has only one configuration parameter: the dropout rate (for example, the two rate fields as shown in listing 4.5). For example, suppose a dropout layer is configured to have a dropout rate of 0.25, and the input tensor is a 1D tensor of value [0.7, -0.3, 0.8, -0.4]; the output tensor may be [0.7, -0.3, 0.0, 0.4]—with 25% of the input tensor’s elements selected at random and set to the value 0. During backpropagation, the gradient tensor on a dropout layer is affected similarly by this random zeroing-out.
  • During the inference phase (during Model.predict() and Model.evaluate() calls), a dropout layer does not randomly zero-out elements in the input tensor. Instead, the input is simply passed through as the output without change (that is, an identity mapping).
  • Figure 4.11 shows an example of how a dropout layer with a 2D input tensor works at training time and testing time.

    Figure 4.11. An example of how a dropout layer works. In this example, the input tensor is 2D and has a shape of [4, 2]. The dropout layer has its rate configured as 0.25, which leads to 25% (that is, two out of eight) elements of the input tensor being randomly selected and set to zero during the training phase. During the inference phase, the layer acts as a trivial passthrough.
    sitemap

    Unable to load book!

    The book could not be loaded.

    (try again in a couple of minutes)

    manning.com homepage
    test yourself with a liveTest