[人工智能] Swin Transformer 做主干的 RetinaNet 目标检测网络（mmdetection）

开发: C++知识库 Java知识库 JavaScript Python PHP知识库人工智能区块链大数据移动开发嵌入式开发工具数据结构与算法开发测试游戏开发网络协议系统运维
教程: HTML教程 CSS教程 JavaScript教程 Go语言教程 JQuery教程 VUE教程 VUE3教程 Bootstrap教程 SQL数据库教程 C语言教程 C++教程 Java教程 Python教程 Python3教程 C#教程
数码: 电脑笔记本显卡显示器固态硬盘硬盘耳机手机 iphone vivo oppo 小米华为单反装机图拉丁

-> 人工智能 -> Swin Transformer 做主干的 RetinaNet 目标检测网络（mmdetection） -> 正文阅读

[人工智能]Swin Transformer 做主干的 RetinaNet 目标检测网络（mmdetection）

文章目录

一、环境与工程

参考：Swin Transformer做主干的 Faster RCNN 目标检测网络
使用的是同一个工程，环境无需再次配置。

二、Swin Transformer RetinaNet 网络代码

1. 在configs/swin 目录下新建文件：retinanet_swin-t-p4-w7_fpn_1x_coco.py
文件内容如下：
注意：虽然这里面/base/models/ 使用的是 retinanet_r50_fpn.py，但是实际上这个文件的内容会对retinanet_r50_fpn.py 中定义的主干进行替换，当然你也可以新建一个文件，我就直接复用了改一下。

_base_ = [
    '../_base_/models/retinanet_r50_fpn.py',
    '../_base_/datasets/coco_detection.py',
    '../_base_/schedules/schedule_1x.py', '../_base_/default_runtime.py'
]
pretrained = 'https://github.com/SwinTransformer/storage/releases/download/v1.0.0/swin_tiny_patch4_window7_224.pth'  # noqa
model = dict(
    backbone=dict(
        _delete_=True,
        type='SwinTransformer',
        embed_dims=96,
        depths=[2, 2, 6, 2],
        num_heads=[3, 6, 12, 24],
        window_size=7,
        mlp_ratio=4,
        qkv_bias=True,
        qk_scale=None,
        drop_rate=0.,
        attn_drop_rate=0.,
        drop_path_rate=0.2,
        patch_norm=True,
        out_indices=(1, 2, 3),
        # Please only add indices that would be used
        # in FPN, otherwise some parameter will not be used
        with_cp=False,
        convert_weights=True,
        init_cfg=dict(type='Pretrained', checkpoint=pretrained)),
    neck=dict(in_channels=[192, 384, 768], start_level=0, num_outs=5))

optimizer = dict(
    _delete_=True,
    type='AdamW',
    lr=0.0001,
    betas=(0.9, 0.999),
    weight_decay=0.05,
    paramwise_cfg=dict(
        custom_keys={
            'absolute_pos_embed': dict(decay_mult=0.),
            'relative_position_bias_table': dict(decay_mult=0.),
            'norm': dict(decay_mult=0.)
        }))
optimizer_config = dict(grad_clip=None)

lr_config = dict(
    policy='step',
    warmup='linear',
    warmup_iters=1000,
    # warmup_ratio=0.1,
    step=[27, 33])

runner = dict(type='EpochBasedRunner', max_epochs=60)

2. 修改 configs/base/models 目录下：retinanet_r50_fpn.py文件中的num_classes
将类被数改成自己数据集的类别数，当然也可以自己重新在上一步创建的retinanet_swin-t-p4-w7_fpn_1x_coco.py文件中定义一下。
比如我使用的是四类，那么就把num_classes 改为4，其他部分不用改。

bbox_head=dict(
        type='RetinaHead',
        num_classes=4, # 修改类别
        in_channels=256,
        stacked_convs=4,
        feat_channels=256,
        ...
        )

3. 修改/base/datasets/ 目录下的 coco_detection.py

img_scale 可根据自己的显存修改小一些，如：512 * 512（两处需要修改）
batchsize 和每个GPU的线程数（samples_per_gpu和workers_per_gpu 这两个参数），根据电脑配置调整。
数据集的路径也是在这配置

备注：当然如果怕修改后影响你训练其他模型的话，也可以直接自己新建一个，然后修改第一步创建的内容里面的 retinanet_swin-t-p4-w7_fpn_1x_coco.py base部分。

修改后的示例如下：

# dataset settings
dataset_type = 'CocoDataset'
data_root = 'data/coco/'
img_norm_cfg = dict(
    mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], to_rgb=True)
train_pipeline = [
    dict(type='LoadImageFromFile'),
    dict(type='LoadAnnotations', with_bbox=True),
    dict(type='Resize', img_scale=(512, 512), keep_ratio=True),
    dict(type='RandomFlip', flip_ratio=0.5),
    dict(type='Normalize', **img_norm_cfg),
    dict(type='Pad', size_divisor=32),
    dict(type='DefaultFormatBundle'),
    dict(type='Collect', keys=['img', 'gt_bboxes', 'gt_labels']),
]
test_pipeline = [
    dict(type='LoadImageFromFile'),
    dict(
        type='MultiScaleFlipAug',
        img_scale=(512, 512),
        flip=False,
        transforms=[
            dict(type='Resize', keep_ratio=True),
            dict(type='RandomFlip'),
            dict(type='Normalize', **img_norm_cfg),
            dict(type='Pad', size_divisor=32),
            dict(type='ImageToTensor', keys=['img']),
            dict(type='Collect', keys=['img']),
        ])
]
data = dict(
    samples_per_gpu=3,
    workers_per_gpu=6,
    train=dict(
        type=dataset_type,
        ann_file=data_root + 'annotations/instances_train2017.json',
        img_prefix=data_root + 'train2017/',
        pipeline=train_pipeline),
    val=dict(
        type=dataset_type,
        ann_file=data_root + 'annotations/instances_val2017.json',
        img_prefix=data_root + 'val2017/',
        pipeline=test_pipeline),
    test=dict(
        type=dataset_type,
        ann_file=data_root + 'annotations/instances_test2017.json',
        img_prefix=data_root + 'test2017/',
        pipeline=test_pipeline))
evaluation = dict(interval=1, metric='bbox')

三、数据集

数据集依然使用默认的coco格式，数据集制作参考数据集标注（LabelImg、LabelMe使用方法）

四、训练模型

直接执行： python tools/train.py configs/swin/retinanet_swin-t-p4-w7_fpn_1x_coco.py
注意：第一次执行会下载权值文件，需要等待一段时间，或者用特殊办法快点下载，权值文件会自动保存到你的电脑上，下次运行的时候就不再需要重新下载了，当然也可以和之前一样，提前下载好权值文件，然后配置一下。

?
?

关于作者：