Backbone-VGG

1.介绍

VGG于2014年由牛津大学科学工程系Visual Geometry Group组提出的，作者在《Very Deep Convolutional Networks for Large-Scale Image Recognition》的abstract中提到，相比于2012年的AlexNet，VGG采用了多个小的卷积（3*3）核来代替AlexNet中的大卷积核（11*11和5*5）。

对于给定的感受野（与输出有关的输入图片的局部大小），采用堆积的小卷积核是优于采用大的卷积核，这个很难用来理论解释，感性认识是，小卷积核加上激活函数属于非线性变换，多层非线性变换会比一层变换得到的特征更具有表达性，而且代价还比较小（参数更少）。简单来说，在VGG中，使用了3个3x3卷积核来代替7x7卷积核，使用了2个3x3卷积核来代替5*5卷积核，这样做的主要目的是在保证具有相同感知野的条件下，提升了网络的深度，在一定程度上提升了神经网络的效果。

比如，3个步长为1的3x3卷积核的一层层叠加作用可看成一个大小为7的感受野（其实就表示3个3x3连续卷积相当于一个7x7卷积），其参数总量为 3x(9xC^2) ，如果直接使用7x7卷积核，其参数总量为 49xC^2 ，这里 C 指的是输入和输出的通道数。很明显，27xC^2小于49xC^2，即减少了参数；而且3x3卷积核有利于更好地保持图像性质。

这里解释一下为什么当stride为1时，使用2个3x3卷积核可以来代替5*5卷积核。代替的意思是两者的感受野相同，下图很形象的解释了这个过程。绿色框卷积之后会形成相邻的三个像素，这三个像素会被一个紫色卷积核全部感受到。

在这里插入图片描述

2.网络结构

论文的第三页给出了网络的结构图。其中D列代表VGG16，E列代表VGG19，相比16多了三个卷积层。conv3-64代表着卷积核大小为3*3，共有64个卷积核。
在这里插入图片描述

下面是VGG16的结构图（网络上的图），很容易与上图D列相对应，这里就不过多阐述了。原先这里有个问题挺疑惑，上图中没有写maxpool的尺寸和各个层的尺寸，下面的的图中各层尺寸是如何得知的呢？再看论文发现，The convolution stride is fixed to 1 pixel; the padding is 1 pixel for 3 × 3 conv ;Max-pooling is performed over a 2 × 2 pixel window, with stride 2 . 这样就对了，卷积核3*3且padding为2时，输出尺寸不变；max-pooling输出长宽缩小为输入的1/2，没有问题。此外，论文中写到All hidden layers are equipped with the rectification (ReLU (Krizhevsky et al., 2012)) non-linearity ，即所有隐藏层都加Relu。

在这里插入图片描述

输入224x224x3的图片，经64个3x3的卷积核作两次卷积+ReLU，卷积后的尺寸变为224x224x64
作max pooling（最大化池化），池化单元尺寸为2x2（效果为图像尺寸减半），池化后的尺寸变为112x112x64
经128个3x3的卷积核作两次卷积+ReLU，尺寸变为112x112x128
作2x2的max pooling池化，尺寸变为56x56x128
经256个3x3的卷积核作三次卷积+ReLU，尺寸变为56x56x256
作2x2的max pooling池化，尺寸变为28x28x256
经512个3x3的卷积核作三次卷积+ReLU，尺寸变为28x28x512
作2x2的max pooling池化，尺寸变为14x14x512
经512个3x3的卷积核作三次卷积+ReLU，尺寸变为14x14x512
作2x2的max pooling池化，尺寸变为7x7x512
与两层1x1x4096，一层1x1x1000进行全连接+ReLU（共三层）
通过softmax输出1000个预测结果

3.代码实现

import torch.nn as nn
import torch

class VGG(nn.Module):
    def __init__(self, num_classes=1000):
        super(VGG, self).__init__()
        self.features = nn.Sequential(
            nn.Conv2d(3, 64, kernel_size=3, stride=1, padding=1),
            nn.ReLU(inplace=True),
            nn.Conv2d(64, 64, kernel_size=3, stride=1, padding=1),
            nn.ReLU(inplace=True),
            nn.MaxPool2d(kernel_size=2, stride=2),

            nn.Conv2d(64, 128, kernel_size=3, stride=1, padding=1),
            nn.ReLU(inplace=True),
            nn.Conv2d(128, 128, kernel_size=3, stride=1, padding=1),
            nn.ReLU(inplace=True),
            nn.MaxPool2d(kernel_size=2, stride=2),

            nn.Conv2d(128, 256, kernel_size=3, stride=1, padding=1),
            nn.ReLU(inplace=True),
            nn.Conv2d(256, 256, kernel_size=3, stride=1, padding=1),
            nn.ReLU(inplace=True),
            nn.Conv2d(256, 256, kernel_size=3, stride=1, padding=1),
            nn.ReLU(inplace=True),
            nn.MaxPool2d(kernel_size=2, stride=2),

            nn.Conv2d(256, 512, kernel_size=3, stride=1, padding=1),
            nn.ReLU(inplace=True),
            nn.Conv2d(512, 512, kernel_size=3, stride=1, padding=1),
            nn.ReLU(inplace=True),
            nn.Conv2d(512, 512, kernel_size=3, stride=1, padding=1),
            nn.ReLU(inplace=True),
            nn.MaxPool2d(kernel_size=2, stride=2),

            nn.Conv2d(512, 512, kernel_size=3, stride=1, padding=1),
            nn.ReLU(inplace=True),
            nn.Conv2d(512, 512, kernel_size=3, stride=1, padding=1),
            nn.ReLU(inplace=True),
            nn.Conv2d(512, 512, kernel_size=3, stride=1, padding=1),
            nn.ReLU(inplace=True),
            nn.MaxPool2d(kernel_size=2, stride=2),
        )

        self.classifier = nn.Sequential(
            nn.Linear(512 * 7 * 7, 4096),
            nn.ReLU(inplace=True),
            nn.Dropout(),

            nn.Linear(4096, 4096),
            nn.ReLU(inplace=True),
            nn.Dropout(),

            nn.Linear(4096, num_classes),
            nn.ReLU(inplace=True),
            nn.Softmax(dim=1),
        )

    def forward(self, x):
        x = self.features(x)
        x = x.view(x.size(0), 512 * 7 * 7)
        x = self.classifier(x)
        return x


if __name__ == '__main__':
    # Example
    net = VGG()
    x = torch.rand(1, 3, 224, 224)
    out = net.forward(x)
    print(out.size())

4.心得

def foo(a, b, c):
    return a+b+c

if __name__ == '__main__':
    a = [1,2,3]
    print(foo(*[1,2,3]))	# 这两种表达是一样的
    print(foo(1,2,3))

Python中的*args和**kwargs

这个用法在2中用到了，文中在创造类的实例时用的。**kwargs可能是为了重写num_classes和init_weights用的吧。

model = VGG(make_layers(cfg['A']), **kwargs)

class VGG(nn.Module):
    def __init__(self, features, num_classes=1000, init_weights=True):
        super(VGG, self).__init__()
        self.features = features
        ......

人工智能最新文章

2022吴恩达机器学习课程——第二课（神经网

第十五章规则学习

FixMatch: Simplifying Semi-Supervised Le

数据挖掘Java——Kmeans算法的实现

大脑皮层的分割方法

【翻译】GPT-3是如何工作的

论文笔记:TEACHTEXT: CrossModal Generaliz

python从零学（六）

详解Python 3.x 导入(import)

【答读者问27】backtrader不支持最新版本的

加:2021-08-01 14:30:37 更:2021-08-01 14:33:36

360图书馆购物三丰科技阅读网日历万年历 2025年7日历

-2025/7/13 2:33:25-

图片自动播放器
↓图片自动播放器↓

TxT小说阅读器
↓语音阅读,小说下载,古典文学↓

一键清除垃圾
↓轻轻一点,清除系统垃圾↓

图片批量下载器
↓批量下载图片,美女图库↓

网站联系: qq:121756557 email:121756557@qq.com IT数码