VGG Network

Overview

The VGG network, developed by the Visual Geometry Group, is a deep convolutional neural network architecture that is renowned for its simplicity and effectiveness in image classification tasks. It employs small 3x3 convolution filters and deep architectures with 16-19 layers. Unlike some other architectures, VGG removes local response normalization layers, focusing instead on simplicity and depth to achieve high accuracy.

Architecture

The VGG architecture is characterized by its use of small convolutional filters and a deep network structure. The VGG-11, one of the variants, is depicted in the architecture diagram below:

[Figure 11.5](https://livebook.manning.com/math-and-architectures-of-deep-learning/chapter-11/figure--11-5) VGG-11 architecture diagram. All shapes are of the form N × C × H × W, where N is the batch size, C is the number of channels, H is the height, and W is the width. Figure 11.5 VGG-11 architecture diagram. All shapes are of the form N × C × H × W, where N is the batch size, C is the number of channels, H is the height, and W is the width.

Implementation

Convolutional Backbone

The VGG network is built using a convolutional backbone, which can be configured to create different VGG architectures. The backbone is defined by a configuration list that specifies the number of input channels, the number of convolutional layers, and the number of features for each block.

class ConvBackbone(nn.Module):
    def __init__(self, cfg):
        super(ConvBackbone, self).__init__()
        self.cfg = cfg
        self.validate_config(cfg)
        modules = []
        for block_cfg in cfg:
            in_channels, num_conv_layers, num_features = block_cfg 
            modules.append(ConvBlock(in_channels, num_conv_layers, num_features))
        self.features = nn.Sequential(*modules)

    def validate_config(self, cfg):
        assert len(cfg) == 5 # 5 conv blocks
        for i, block_cfg in enumerate(cfg):
            assert type(block_cfg) == tuple and len(block_cfg) == 3
            if i == 0:
                assert block_cfg[0] == 3
            else:
                assert block_cfg[0] == cfg[i-1][-1]

    def forward(self, x):
        return self.features(x)

VGG Network

The VGG network consists of the convolutional backbone followed by a classifier made up of three fully connected (FC) layers. The first two FC layers are followed by ReLU nonlinearity and dropout for regularization.

class VGG(nn.Module):
    def __init__(self, conv_backbone, num_classes):
        super(VGG, self).__init__()
        self.conv_backbone = conv_backbone
        self.classifier = nn.Sequential(
            nn.Linear(512 * 7 * 7, 4096),
            nn.ReLU(True),
            nn.Dropout(),
            nn.Linear(4096, 4096),
            nn.ReLU(True),
            nn.Dropout(),
            nn.Linear(4096, num_classes)
        )

    def forward(self, x):
        conv_features = self.conv_backbone(x)
        logits = self.classifier(conv_features.view(conv_features.shape[0], -1))
        return logits

Instantiating VGG-11

A VGG-11 network can be instantiated using a specific configuration for the convolutional backbone:

vgg11_cfg = [
    (3, 1, 64),
    (64, 1, 128),
    (128, 2, 256),
    (256, 2, 512),
    (512, 2, 512)
]

vgg11_backbone = ConvBackbone(vgg11_cfg)
num_classes = 1000
vgg11 = VGG(vgg11_backbone, num_classes)

Practical Considerations

While the above code demonstrates how to implement the VGG network in PyTorch, it is generally recommended to use the torchvision package, which provides a pre-implemented version of the VGG network along with several other popular deep networks. This approach ensures optimized performance and ease of use.

Unable to load book!