VGG Network
Overview
The VGG network, developed by the Visual Geometry Group, is a deep convolutional neural network architecture that is renowned for its simplicity and effectiveness in image classification tasks. It employs small 3x3 convolution filters and deep architectures with 16-19 layers. Unlike some other architectures, VGG removes local response normalization layers, focusing instead on simplicity and depth to achieve high accuracy.
Architecture
The VGG architecture is characterized by its use of small convolutional filters and a deep network structure. The VGG-11, one of the variants, is depicted in the architecture diagram below:
Figure 11.5 VGG-11 architecture diagram. All shapes are of the form N × C × H × W, where N is the batch size, C is the number of channels, H is the height, and W is the width.
Implementation
Convolutional Backbone
The VGG network is built using a convolutional backbone, which can be configured to create different VGG architectures. The backbone is defined by a configuration list that specifies the number of input channels, the number of convolutional layers, and the number of features for each block.
class ConvBackbone(nn.Module):
def __init__(self, cfg):
super(ConvBackbone, self).__init__()
self.cfg = cfg
self.validate_config(cfg)
modules = []
for block_cfg in cfg:
in_channels, num_conv_layers, num_features = block_cfg
modules.append(ConvBlock(in_channels, num_conv_layers, num_features))
self.features = nn.Sequential(*modules)
def validate_config(self, cfg):
assert len(cfg) == 5 # 5 conv blocks
for i, block_cfg in enumerate(cfg):
assert type(block_cfg) == tuple and len(block_cfg) == 3
if i == 0:
assert block_cfg[0] == 3
else:
assert block_cfg[0] == cfg[i-1][-1]
def forward(self, x):
return self.features(x)
VGG Network
The VGG network consists of the convolutional backbone followed by a classifier made up of three fully connected (FC) layers. The first two FC layers are followed by ReLU nonlinearity and dropout for regularization.
class VGG(nn.Module):
def __init__(self, conv_backbone, num_classes):
super(VGG, self).__init__()
self.conv_backbone = conv_backbone
self.classifier = nn.Sequential(
nn.Linear(512 * 7 * 7, 4096),
nn.ReLU(True),
nn.Dropout(),
nn.Linear(4096, 4096),
nn.ReLU(True),
nn.Dropout(),
nn.Linear(4096, num_classes)
)
def forward(self, x):
conv_features = self.conv_backbone(x)
logits = self.classifier(conv_features.view(conv_features.shape[0], -1))
return logits
Instantiating VGG-11
A VGG-11 network can be instantiated using a specific configuration for the convolutional backbone:
vgg11_cfg = [
(3, 1, 64),
(64, 1, 128),
(128, 2, 256),
(256, 2, 512),
(512, 2, 512)
]
vgg11_backbone = ConvBackbone(vgg11_cfg)
num_classes = 1000
vgg11 = VGG(vgg11_backbone, num_classes)
Practical Considerations
While the above code demonstrates how to implement the VGG network in PyTorch, it is generally recommended to use the torchvision
package, which provides a pre-implemented version of the VGG network along with several other popular deep networks. This approach ensures optimized performance and ease of use.