concept feature map in category deep learning

appears as: feature maps, feature map, feature maps, feature map, A feature map, features maps
Deep Learning Design Patterns MEAP V02

This is an excerpt from Manning's book Deep Learning Design Patterns MEAP V02.

def stem(inputs):
        """ Create the stem entry into the neural network
            inputs : input tensor to neural network
        """
        # Strided convolution - dimensionality reduction
        # Reduce feature maps by 75%
        outputs = Conv2D(32, (3, 3), strides=(2, 2))(inputs)    #A
        outputs = BatchNormalization()(outputs)
        outputs = ReLU()(outputs)
 
        # Convolution - dimensionality expansion
        # Double the number of filters
        outputs = Conv2D(64, (3, 3), strides=(1, 1))(outputs)    #A
        outputs = BatchNormalization()(outputs)
        outputs = ReLU()(outputs)
        return outputs

Next, we consider which output level we share with each of the two tasks. For the minor damage, we are looking at tiny objects. While we are not covering object detection until a later chapter, the historic problem with object classification with small objects was that the cropped feature maps after being pooled contained too little spatial information. The fix was to do the object classification from feature maps at an earlier convolution where the feature maps are of sufficient size, that when a tiny object is cropped out there is enough spatial information for object classification.

The second hyperparameter introduced was the resolution multiplier ρ (rho), which thins the input shape and consequently the feature map sizes at each layer.

Let’s take a quick look at the pros and cons of reducing the input resolution. When we reduce the input resolution without altering the stem component, the size of the feature maps entering the learner component is correspondingly reduced. For example, if the height and width of an input image is reduced by ½, then the number of input pixels is reduced by 75%. If we maintain the same coarse level filters and number of filters, the outputted feature maps would be reduced by 75%. Since the feature maps are reduced, this will have a downstream effect of reducing the number of parameters per convolution (model size) and number of matmul operations (latency). Note this is in contrast to the width thinning which would reduce the number of feature maps, while maintaining their size.

The downside is that if we reduce too aggressively, the size of the feature maps by the time we get to the bottleneck may be 1x1 pixels and in essence lose the spatial relationships. We could offset this by reducing the number of intermediate layers so the feature maps are bigger than 1x1, but then we are removing more overcapacity for accuracy.

The learner component consists of seven inverted residual groups, followed by a 1x1 linear convolution. Each inverted residual group consists of two or more inverted residual blocks, where each group progressively increases the number of filters, also known as output channels. Each group starts with a strided convolutional, reducing the size of the feature maps (channels) as each group progressively increases the number of feature maps.

Figure 5.8 is a depiction of a MobileNet v2 group, where the first inverted residual block is strided for reducing the size of the feature maps to offset the progressive increase in the number of feature maps per group. As noted in the diagram, only groups number 2, 3, 4 and 6, of 8, start with a strided inverted residual block. In other words, groups 1, 5, 7 and 8 start with a non-strided residual block. Additionally, each non-strided block has an identity link, and the strided blocks do not have an identity link.

Fig. 5.8 MobileNet v2 Group Micro-Architecture

Below is an example implementation of a MobileNet v2 group. The group follows the convention where the first block does a dimensionality reduction to reduce the size of feature maps. In this case, the first inverted block is strided (feature pooling), with the remaining blocks are not strided (no feature pooling).

Deep Learning for Vision Systems MEAP V08 livebook

This is an excerpt from Manning's book Deep Learning for Vision Systems MEAP V08 livebook.

What is feature maps?

The basic idea of neural networks is that neurons learn features from the input. In CNNs, the feature map is the output of one filter applied to the previous layer. It is called a feature map because it is a mapping of where a certain kind of feature is found in the image. Convolutional Neural Networks look for "features" such as straight lines, edges, or even objects. Whenever they spot these features they report them to the feature map. Each feature map is looking for something else. One feature map could be looking for straight lines, the other for curves.

As you might have noticed from Figure 8.7, the generator model looks like an inverted ConvNet. The generator takes a vector input with some random noise data, reshapes it to cube volume that has a width, height and depth. This volume is meant to be treated as a feature map that will be fed to several convolutional layers that will create the final image.

Figure 9.3: visualizing feature maps produced by block1_conv1 filters.
sitemap

Unable to load book!

The book could not be loaded.

(try again in a couple of minutes)

manning.com homepage
test yourself with a liveTest