[人工智能] FCOS代码（一） (demo过程)骨干网络结构详解，mask-rcnn ResNet+fpn

下面是源代码中打印的整体网络结构，然后将其分为ResNet和FPN两部分描述进行描述，config使用的是fcos_imprv_R_50_FPN_1x.yaml，BACKBONE.CONV_BODY="R-50-FPN-RETINANET"

上面的ResNet骨干网络采用的是表1中的50-layer结构，对照着ResNet的网络结构表格（见下面表格），接下来细节描述一下：

为表1中conv1 表示的模块外下面的3x3 max pool，由于这个模块各种结构都是通用的，并且接受图片输入（输入通道都是3，输出通道因不同的结构而异），所以名字带有stem（茎干），其代码定义如下，从中可以看出与表1conv1 的部分完全吻合(forward conv1 后接F.max_pool2d)

class BaseStem(nn.Module):
    def __init__(self, cfg, norm_func):
        super(BaseStem, self).__init__()

        out_channels = cfg.MODEL.RESNETS.STEM_OUT_CHANNELS

        self.conv1 = Conv2d(
            3, out_channels, kernel_size=7, stride=2, padding=3, bias=False
        )
        self.bn1 = norm_func(out_channels)

        for l in [self.conv1,]:
            nn.init.kaiming_uniform_(l.weight, a=1)

    def forward(self, x):
        x = self.conv1(x)
        x = self.bn1(x)
        x = F.relu_(x)
        x = F.max_pool2d(x, kernel_size=3, stride=2, padding=1)
        return x
class StemWithFixedBatchNorm(BaseStem):
    def __init__(self, cfg):
        super(StemWithFixedBatchNorm, self).__init__(
            cfg, norm_func=FrozenBatchNorm2d
        )

定义如下，一些相关参数的形状如后面注释所示（n值为前面相邻卷积的输出通道数），与正常的BatchNorm 层的作用一样，使一批Batch 的 feature map 满足均值为0，方差为1的分布规律。区别是它使用了?self.register_buffer ，参数固定，训练时不更新，参考pytorch 中register_buffer（）

class FrozenBatchNorm2d(nn.Module):
    """
    BatchNorm2d where the batch statistics and the affine parameters
    are fixed
    """

    def __init__(self, n):  # n=64
        super(FrozenBatchNorm2d, self).__init__()
        self.register_buffer("weight", torch.ones(n))
        self.register_buffer("bias", torch.zeros(n))
        self.register_buffer("running_mean", torch.zeros(n))
        self.register_buffer("running_var", torch.ones(n))

    def forward(self, x):  # x= {Tensor:(1,64,400,560)}
        scale = self.weight * self.running_var.rsqrt()  # Tensor:(64,) 
        bias = self.bias - self.running_mean * scale  # Tensor:(64,)
        scale = scale.reshape(1, -1, 1, 1)  # Tensor: (1,64,1,1)
        bias = bias.reshape(1, -1, 1, 1)  # Tensor: (1,64,1,1)
        return x * scale + bias

这个就是定义构成ResNet的基本模块类（如下面代码所示），从文章开头所示的整体网络结构中可以看见，每层（不同的layer表示）开头都有downsample 结构，这是由于ResNet残差网络结构由于需要每层的输入和输出进行加和，如果该层的输入通道数与输出通道数不一样（即形状不一致），则需要对输入进行一个转变使得通道数保持与输出通道数一致。可以参考CV脱坑指南（二）：ResNet·downsample详解

class Bottleneck(nn.Module):
    def __init__(
        self,
        in_channels,
        bottleneck_channels,
        out_channels,
        num_groups,
        stride_in_1x1,
        stride,
        dilation,
        norm_func,
        dcn_config
    ):
        super(Bottleneck, self).__init__()

        self.downsample = None  # 输入通道与输出通道不相等则采用这个
        if in_channels != out_channels:  
            down_stride = stride if dilation == 1 else 1
            self.downsample = nn.Sequential(
                Conv2d(
                    in_channels, out_channels,
                    kernel_size=1, stride=down_stride, bias=False
                ),
                norm_func(out_channels),
            )
            for modules in [self.downsample,]:
                for l in modules.modules():
                    if isinstance(l, Conv2d):
                        nn.init.kaiming_uniform_(l.weight, a=1)

        if dilation > 1:
            stride = 1 # reset to be 1

        # The original MSRA ResNet models have stride in the first 1x1 conv
        # The subsequent fb.torch.resnet and Caffe2 ResNe[X]t implementations have
        # stride in the 3x3 conv
        stride_1x1, stride_3x3 = (stride, 1) if stride_in_1x1 else (1, stride)

        self.conv1 = Conv2d(
            in_channels,
            bottleneck_channels,
            kernel_size=1,
            stride=stride_1x1,
            bias=False,
        )
        self.bn1 = norm_func(bottleneck_channels)
        # TODO: specify init for the above
        with_dcn = dcn_config.get("stage_with_dcn", False)
        if with_dcn:
            deformable_groups = dcn_config.get("deformable_groups", 1)
            with_modulated_dcn = dcn_config.get("with_modulated_dcn", False)
            self.conv2 = DFConv2d(
                bottleneck_channels,
                bottleneck_channels,
                with_modulated_dcn=with_modulated_dcn,
                kernel_size=3,
                stride=stride_3x3,
                groups=num_groups,
                dilation=dilation,
                deformable_groups=deformable_groups,
                bias=False
            )
        else:
            self.conv2 = Conv2d(
                bottleneck_channels,
                bottleneck_channels,
                kernel_size=3,
                stride=stride_3x3,
                padding=dilation,
                bias=False,
                groups=num_groups,
                dilation=dilation
            )
            nn.init.kaiming_uniform_(self.conv2.weight, a=1)

        self.bn2 = norm_func(bottleneck_channels)

        self.conv3 = Conv2d(
            bottleneck_channels, out_channels, kernel_size=1, bias=False
        )
        self.bn3 = norm_func(out_channels)

        for l in [self.conv1, self.conv3,]:
            nn.init.kaiming_uniform_(l.weight, a=1)

    def forward(self, x):
        identity = x

        out = self.conv1(x)
        out = self.bn1(out)
        out = F.relu_(out)

        out = self.conv2(out)
        out = self.bn2(out)
        out = F.relu_(out)

        out0 = self.conv3(out)
        out = self.bn3(out0)

        if self.downsample is not None:  # 采用downsample使输入和输出通道数相等
            identity = self.downsample(x)

        out += identity  # 残差结构 输入与输出 相加
        out = F.relu_(out)

        return out
class BottleneckWithFixedBatchNorm(Bottleneck):
    def __init__(
        self,
        in_channels,
        bottleneck_channels,
        out_channels,
        num_groups=1,
        stride_in_1x1=True,
        stride=1,
        dilation=1,
        dcn_config=None
    ):
        super(BottleneckWithFixedBatchNorm, self).__init__(
            in_channels=in_channels,
            bottleneck_channels=bottleneck_channels,
            out_channels=out_channels,
            num_groups=num_groups,
            stride_in_1x1=stride_in_1x1,
            stride=stride,
            dilation=dilation,
            norm_func=FrozenBatchNorm2d,
            dcn_config=dcn_config
        )

至此，ResNet的定义如下，每一层的输出都保存在了outputs的列表里（看注释），一共4个layers（见开头打印的部分）所以共4个输出，并且每个输出的size依次缩小一倍（2倍下采样）。它的输出接下来会与FPN网络相结合。

其中，state_specs 值如下图所示，它包含的是定义的StageSpec字典类。namedtuple的用法参考Python的namedtuple使用详解。

in_channels_stage2 = cfg.MODEL.RESNETS.RES2_OUT_CHANNELS  # 256
out_channels = cfg.MODEL.RESNETS.BACKBONE_OUT_CHANNELS  # 256
in_channels_p6p7 = in_channels_stage2 * 8 if cfg.MODEL.RETINANET.USE_C5 \
    else out_channels  # 256
fpn = fpn_module.FPN(
    in_channels_list=[
        0,
        in_channels_stage2 * 2,
        in_channels_stage2 * 4,
        in_channels_stage2 * 8,
    ],
    out_channels=out_channels,
    conv_block=conv_with_kaiming_uniform(
        cfg.MODEL.FPN.USE_GN, cfg.MODEL.FPN.USE_RELU
    ),
    top_blocks=fpn_module.LastLevelP6P7(in_channels_p6p7, out_channels),
)
model = nn.Sequential(OrderedDict([("body", body), ("fpn", fpn)]))
model.out_channels = out_channels
return model

可以看到conv_block =?conv_with_kaiming_uniform（cfg.MODEL.FPN.USE_GN, cfg.MODEL.FPN.USE_RELU），其中cfg.MODEL.FPN.USE_GN=False，cfg.MODEL.FPN.USE_RELU=False，conv_with_kaiming_uniform函数如下所示，它的返回值是一个函数 make_conv,? 是组成FPN网络的基本模块。其中nn.init.kaiming_uniform_ 是一种权重参数的初始化方法，名为kaiming(由何凯明提出)，参考神经网络权重初始化代码 init.kaiming_uniform_和kaiming_normal_。? use_gn中的gn指的是group norm 归一化操作，将channel方向分group，然后每个group内做归一化，算(C//G)HW的均值，这样与batchsize无关，不受其约束，参考PyTorch学习之归一化层（BatchNorm、LayerNorm、InstanceNorm、GroupNorm）

定义如下，就是执行max_pool2d操作，只不过使用的是torch.nn.function里的，参数依次为input，kernel_size，stride，padding，详见pytorch官网，不过它的作用与torch.nn.MaxPool2d一样，对于输入信号的输入通道，提供2维最大池化（max pooling）操作。

另一种（也是这次使用的）是LastLevelP6P7，定义如下。表示最后一层不需要池化，而是在FPN上另外加了P6、P7两层。

至此，最终的FPN类定义如下，注意results（最终输出5个Tesnsor，看注释）里的结果，按次序size依次缩小2倍，而且注意last_inner = inner_lateral + inner_top_down 这步操作，就是FPN的加和操作。其中的inner_lateral是对来自FPN网络左边的top-down结构（按次序来的，从上到下）采用1x1的卷积进行降维处理，inner_top_down是对FPN网络右边的结构进行上采样操作（利用F.interpolate实现，参考F.interpolate——数组采样操作），然后将两者相加（对应元素相加），最后进行3x3的卷积操作（参考FPN理解）。这里是从上到下进行的（从顶部先输出小尺寸的特征图），只不过利用了列表的insert函数（参考python insert()函数解析）依次把后一个结果插到之前位置。last_results = self.top_blocks(x[-1], results[-1]) 这里就是采用外加的P6P7层，x[-1], results[-1]是传入的c5（ResNet提取特征的最后一层）和p5（FPN在加入P6和P7之前的次序最后的输出，size最小的）参数，最终选取的是p5。

[人工智能]FCOS代码（一） (demo过程)骨干网络结构详解，mask-rcnn ResNet+fpn