1.Yolov5s 框架图
首先我们用netron得到yolov5s的框架结构图如下,可以非常直观的得到关于backbone部分的网络结构图
- yolov5s结构图
注:我们只需要解析名叫:“yolo5s.yaml”文件 代码如下:
parameters
nc: 6
depth_multiple: 0.33
width_multiple: 0.50
anchors:
- [10,13, 16,30, 33,23]
- [30,61, 62,45, 59,119]
- [116,90, 156,198, 373,326]
backbone:
[[-1, 1, Focus, [64, 3]],
[-1, 1, Conv, [128, 3, 2]],
[-1, 3, BottleneckCSP, [128]],
[-1, 1, Conv, [256, 3, 2]],
[-1, 9, BottleneckCSP, [256]],
[-1, 1, Conv, [512, 3, 2]],
[-1, 9, BottleneckCSP, [512]],
[-1, 1, Conv, [1024, 3, 2]],
[-1, 1, SPP, [1024, [5, 9, 13]]],
[-1, 3, BottleneckCSP, [1024, False]],
]
head:
[[-1, 1, Conv, [512, 1, 1]],
[-1, 1, nn.Upsample, [None, 2, 'nearest']],
[[-1, 6], 1, Concat, [1]],
[-1, 3, BottleneckCSP, [512, False]],
[-1, 1, Conv, [256, 1, 1]],
[-1, 1, nn.Upsample, [None, 2, 'nearest']],
[[-1, 4], 1, Concat, [1]],
[-1, 3, BottleneckCSP, [256, False]],
[-1, 1, Conv, [256, 3, 2]],
[[-1, 14], 1, Concat, [1]],
[-1, 3, BottleneckCSP, [512, False]],
[-1, 1, Conv, [512, 3, 2]],
[[-1, 10], 1, Concat, [1]],
[-1, 3, BottleneckCSP, [1024, False]],
[[17, 20, 23], 1, Detect, [nc, anchors]],
]
2.backbone 结构图:
- backbone的意义是:在不同图像细粒度上聚合并形成图像特征的卷积神经网络;
- backbone所需的主要模块在common.py里面可以找到。
从整体结构中我们抽出backbone部分进行讲解如下:
- backbone结构图:
- Backbone 对应代码:
backbone:
[[-1, 1, Focus, [64, 3]],
[-1, 1, Conv, [128, 3, 2]],
[-1, 3, BottleneckCSP, [128]],
[-1, 1, Conv, [256, 3, 2]],
[-1, 9, BottleneckCSP, [256]],
[-1, 1, Conv, [512, 3, 2]],
[-1, 9, BottleneckCSP, [512]],
[-1, 1, Conv, [1024, 3, 2]],
[-1, 1, SPP, [1024, [5, 9, 13]]],
[-1, 3, BottleneckCSP, [1024, False]],
]
从上述代码中我们可以看到backbone由如下组成:
BACKBONE =FOCUS(1个)+CONV (1个)+BCSP(3个)+CONV (1个)+BCSP(9个)+CONV (1个)+SPP(1个)+BCSP(1个)
2.1 FOCUS模块
- 以yolov5s为例, Focus:将640 × 640 × 3的图像输入Focus结构,采用切片操作,先变成320 × 320 × 12的特征图,再经过3 × 3的卷积操作,输出通道32,最终变成320 × 320 × 32的特征图
注:在common.py文件中;
class Focus(nn.Module):
def __init__(self, c1, c2, k=1, s=1, p=None, g=1, act=True):
super(Focus, self).__init__()
self.conv = Conv(c1 * 4, c2, k, s, p, g, act)
def forward(self, x):
return self.conv(torch.cat([x[..., ::2, ::2], x[..., 1::2, ::2], x[..., ::2, 1::2], x[..., 1::2, 1::2]], 1))
- FOCUS模块的结构图如下:
2.2 CBL模块
CONV模块是一个标准的卷积模块,从类CONV中可以看出来,在此版本中用的是hardswish为激活函数。 CBL=CONV+BN+Leakyrelu【从代码来看激活函数改成了hardswish函数】
- CONV :来自于代码的torch.nn.Conv2d,是一个卷积操作
- BN:来自于代码的torch.nn.BatchNorm2d:归一化处理,使batch里面的feature map 满足均值为1,方差为0 的正太分布
- Hardswish:激活函数
故:CBL=CONV+BN+Hardswish
class Conv(nn.Module):
def __init__(self, c1, c2, k=1, s=1, p=None, g=1, act=True):
super(Conv, self).__init__()
self.conv = nn.Conv2d(c1, c2, k, s, autopad(k, p), groups=g, bias=False)
self.bn = nn.BatchNorm2d(c2)
self.act = nn.Hardswish() if act else nn.Identity()
def forward(self, x):
return self.act(self.bn(self.conv(x)))
def fuseforward(self, x):
return self.act(self.conv(x))
2.3 BottleneckCSP模块
这里相当于是几个标准Bottleneck的堆叠+几个标准卷积层
-
BottleneckCSP的网络结构: -
子组件-Resunit的网络结构: -
BottleneckCSP的代码:
class BottleneckCSP(nn.Module):
def __init__(self, c1, c2, n=1, shortcut=True, g=1, e=0.5):
super(BottleneckCSP, self).__init__()
c_ = int(c2 * e)
self.cv1 = Conv(c1, c_, 1, 1)
self.cv2 = nn.Conv2d(c1, c_, 1, 1, bias=False)
self.cv3 = nn.Conv2d(c_, c_, 1, 1, bias=False)
self.cv4 = Conv(2 * c_, c2, 1, 1)
self.bn = nn.BatchNorm2d(2 * c_)
self.act = nn.LeakyReLU(0.1, inplace=True)
self.m = nn.Sequential(*[Bottleneck(c_, c_, shortcut, g, e=1.0) for _ in range(n)])
def forward(self, x):
y1 = self.cv3(self.m(self.cv1(x)))
y2 = self.cv2(x)
return self.cv4(self.act(self.bn(torch.cat((y1, y2), dim=1))))
注:nn.sequential将所有的块链接在一起。self.bn = nn.BatchNorm2d(2 * c_)就是concat 块cv2,cv3,对应于图中的concat;
class Bottleneck(nn.Module):
def __init__(self, c1, c2, shortcut=True, g=1, e=0.5):
super(Bottleneck, self).__init__()
c_ = int(c2 * e)
self.cv1 = Conv(c1, c_, 1, 1)
self.cv2 = Conv(c_, c2, 3, 1, g=g)
self.add = shortcut and c1 == c2
def forward(self, x):
return x + self.cv2(self.cv1(x)) if self.add else self.cv2(self.cv1(x))
注:cv1,cv2对应于图中的CBL模块,add不变;
2.4 SPP模块
SPP模块(空间金字塔池化模块), 分别采用5/9/13的最大池化,再进行concat融合,提高感受野。 SPP的输入是512x20x20,经过1x1的卷积层后输出256x20x20,然后经过并列的三个Maxpool进行下采样,将结果与其初始特征相加,输出1024x20x20,最后用512的卷积核将其恢复到512x20x20
- SPP模块的结构图:
- SPP模块的代码:
class SPP(nn.Module):
def __init__(self, c1, c2, k=(5, 9, 13)):
super(SPP, self).__init__()
c_ = c1 // 2
self.cv1 = Conv(c1, c_, 1, 1)
self.cv2 = Conv(c_ * (len(k) + 1), c2, 1, 1)
self.m = nn.ModuleList([nn.MaxPool2d(kernel_size=x, stride=1, padding=x // 2) for x in k])
def forward(self, x):
x = self.cv1(x)
return self.cv2(torch.cat([x] + [m(x) for m in self.m], 1))
|