深度学习经典检测方法概述
检测任务中阶段的意义
- One Stage: YOLO系列
输出检测框的左上角和右下角坐标 (x1,y1),(x2,y2) 一个CNN网络回归预测即可 - Two Stage: Faster-rcnn Mask-Rcnn系列
与one-stage 相比,多了一步提取候选框(预选)。
一个比方: 假设公司要招聘一些优秀的人才,one-stage 是直接一次海选,two-stage有两层选拔(相当于多了一个淘汰局)。 one-satge 速度更快适合做实时检测任务,two-stage检测效果更好。
衡量检测的指标
- IOU
IOU 就是真实框和预测框间的交并比 - 准确度和召回率
TP: 被正确(true)的判为正样本(positive)的数目 (原来是正样本) FP: 被错误(false)的判为正样本(positive)的数目(原来是负样本) FN: 被错误(true)的判为负样本(negative)的数目 (原来是正样本) TN: 被正确(true)的判为负样本(negative)的数目 (原来是负样本)
Precision: 描述的是找对的概率 Recall: 描述的是找全的概率
一个比方: 假设在做一个多选题,一共6个选项分别是A~F,正确答案是ABCDE, 你选ABF。 答案中只有AB是对的,准确率是2/3。但还有CDE没有选,找全的比例是2/5.
一般来说,准确率和找回率不能同时兼顾,一个高另一个就相对变低。 改变阈值(IOU大于一定的阈值,就被判为正样本),可以得到不同的precision和recall, 然后做出presion - recall 的图(一般叫做PR图) 去上限求出与x轴围成的阴影的面积就是AP的值。如上图就是长方形A1,A2,A3,A4的面积之和。
对每个类别都计算出AP,求均值得到mAP
YOLO- V1整体思想与网络架构
- 经典的one-stage方法
- You Only Look Once,名字就已经说明了一切!
- 把检测问题转化成回归问题,一个CNN就搞定了!
- 可以对视频进行实时检测,应用领域非常广!
核心思想: 输入数据 一张图片 ——> 然后把图片分割成7*7的网格——> 然后每个格子对应两个候选框(长宽是根据经验得到)——>真实值与候选框得到IOU——> 选择IOU大的那个候选框——> 对选出来的候选框的长宽进行微调——> 预测的框的中心点坐标(x,y),长宽w, h, 还有置信度(是物体的概率)
- 网络架构
输入图像resize到48483 经过多次卷积得到771024的特征图 然后全连接展开,第一个全连接得到4096个特征,第二个全连接得到1470个特征 再通过reshape 得到7730。(每张图片的网格数是7*7的,每个格子对应30个特征值,其中30个特征值的前10个是两个候选框的值x,y,w,h,c,后面的20代表的是20个分类,即属于每个类别的概率)
x,y,w,h表示的是归一化后的值,是相对位置和大小
损失函数:
YOLOv1问题 问题1:每个Cell只预测一个类别,如果重叠无法解决 问题2:小物体检测效果一般,长宽比可选的但单一
YOLO-V2改进细节详解
改进细节
-
引入 Batch Normalizatioin V2版本舍弃Dropout(V1全连接的时候杀死一些神经远,防止过拟合,V2版本没有全连接层),卷积后全部加入Batch Normalization 网络的每一层的输入都做了归一化,收敛相对更容易 经过Batch Normalization处理后的网络会提升2%的mAP 从现在的角度来看,Batch Normalization已经成网络必备处理 -
更大的分辨率 V1训练时用的是224224,测试时使用448448 可能导致模型水土不服,V2训练时额外又进行了10次448*448 的微调 使用高分辨率分类器后,YOLOv2的mAP提升了约4%
网络结构
- DarkNet19,实际输入为416*416
- 没有FC层,5次降采样,(13*13)
- 1*1卷积节省了很多参数
5次下采样,长宽缩小2^5=32. 416/32=13 ,除法得到的值最好是奇数,便于找到中心点对应的网格
- 聚类提取先验框
通过聚类得到5个先验框
faster-rcnn的先验框的比例分别是1:1,1:2,2:1,每个比例对应3个不同的size, 所以一共有3*3=9个先验框
- Anchor Box
通过引入anchor boxes,使得预测的box数量更多(1313n) 跟faster-rcnn系列不同的是先验框并不是直接按照长宽固定比给定 - Directed Location Prediction
- 感受野
是特征图上的点能看到原始图像多大区域 感受野越大,越能感受全局的物体。 堆叠两个33的卷积层,感受野是55。 堆叠三个33的卷积层,感受野是77。
为什么采用小卷积核的堆叠而不是直接采用一个大的卷积核来扩大感受野呢?
- 特征融合
最后一层时感受野太大了,小目标可能丢失了,需融合之前的特征。 将最后一层和倒数第二层拆分后拼接在一起。 - 多尺度
YOLOv3核心网络模型
V3最大的改进就是网络结构,使其更适合小目标检测 特征做的更细致,融入多持续特征图信息来预测不同规格物体 先验框更丰富了,3种scale,每种3个规格,一共9种 softmax改进,预测多标签任务
改进之处:
- 多scale
设计大中小三个专门预测不同大小的目标 分别是5252, 2626,13*13(其值越大,候选库框越小),每个BOX代表不同的长宽比。
如下,左图是制作图像金字塔,分别对于不同大小的图片进行预测。但速度较慢。 右图是通过上采样,不同大小的特征图进行融合。
- 残差连接
思想: 只用好的,不好的舍弃,不比原来差。
核心网络架构: darknet53
- 没有池化,因为池化会压缩特征
- 全连接层不实用,在v2中已经去掉
- 下采样通过stride =2实现,图像大小缩小为原来的一半。
- 网格大小有三种
V1 中网格是77 V2中网格是1313 V3中网格是13*13 26*26 52*52
YOLOV-3 源码详解
下载地址:
PyTorch-YOLOv3源码下载地址 预训练权重
coco 数据集: http://images.cocodataset.org/zips/val2014.zip http://images.cocodataset.org/zips/train2014.zip
参考资料: yolov3源码解析 https://blog.csdn.net/qq_24739717/article/details/92399359
config 文件夹
classes= 80
train=data/coco/trainvalno5k.txt
valid=data/coco/5k.txt
names=data/coco.names
backup=backup/
eval=coco
- create_custom_model.sh
?脚本文件:用户自定义自己的模型,运行此文件用来生成自定义模型的配置文件yolov3-custom.cfg。可对比yolov3.cfg - custom.data
自己数据集的信息,用来训练自己的检测任务:类别数量,训练集路径、验证集路径、类别名称路径 - yolov3.cfg
yolov3网络模型的配置信息:卷积层(卷积核数、卷积核尺寸、步长…)、yolo层及其他层的配置信息。 - yolov3-custom.cfg
??自定义的网络模型的配置信息,由create_custom_model.sh脚本文件生成。 - yolov3-tiny.cfg
??yolov3的tiny版本网络模型的配置信息。
data 文件夹
coco文件夹
是coco训练集、验证集的数据集,是运行get_coco_dataset.sh脚本文件后的结果。
custom 文件夹
是自定义数据集的信息。 1)images文件夹:所有训练集、验证集的图片。 2)labels文件夹:使用图片标记软件对images文件夹里的图片进行标注得到对应的标签文件。每个标签文件为一个txt文件,txt文件的每一行数据为一个groundthuth信息:类别序号,边界框坐标信息。如图示例,0代表类别索引号,后面为边界框坐标信息 3)classes.names是自定义数据集的类别名称文件。 4)train.txt文件是训练集图片路径的集合,每行数据是训练集某图像的路径。? 5)valid.txt文件是验证集图片路径的集合,每行数据是训练集某图片的路径。
samples文件夹
是模型测试图片所在的文件夹,用来看模型的检测结果。
coco.names
coco数据的类别信息,类似classes.names。
get_coco_dataset.sh
脚本文件,用来获取coco数据,生成coco文件夹及其内容。
Utils文件夹
augmentations.py
进行数据增强的文件,本项目只是进行水平翻转的数据增强,图像进行翻转的时候,对应标注信息也进行了修改,最终返回的是翻转后的图片和翻转后的图片对应的标签。
import torch
import torch.nn.functional as F
import numpy as np
- horisontal_flip()
输入:image,targets 是原始图像和标签; 返回:images,targets是翻转后的图像和标签。 功能:horisontal_flip() 函数是对图像进行数据增强,使得数据集得到扩充。在此处只采用了对图片进行水平方向上的镜像翻转。
def horisontal_flip(images, targets):
images = torch.flip(images, [-1])
targets[:, 2] = 1 - targets[:, 2]
return images, targets
torch.flip(input,dims) ->tensor 功能:对数组进行反转 参数: imput 反转的tensor ; dim 反转的维度 返回: 反转后的tensor 由于image 是用数组存储起来的(c,h,w),三个维度分别代表颜色通道,垂直方向,水平方向。python 中[-1] 代表最后一个数,即水平方向。
targets是对应的标签[置信度,中心点高度,中心点宽度,框高度,框宽度], 其中高度宽度都是用相对位置表示的,范围是[0,1]。
datasets.py
对数据集进行操作的py文件,包含图像的填充、图像大小的调整、测试数据集的加载类、评估数据集的加载类。整个文件包含3个函数和2个类,如下
import glob
import random
import os
import sys
import numpy as np
from PIL import Image
import torch
import torch.nn.functional as F
from utils.augmentations import horisontal_flip
from torch.utils.data import Dataset
import torchvision.transforms as transforms
- pad_to_square
输入:img 原始图片,pad_value 填充padding的值 输出:padding 后的图片 功能: 将原始的图片添加padding,使之扩充为一个正方形。正方形的边长取max(width,length)。然后采用F.pad 函数用常数填充padding
'''图片填充函数:
把图片用pad_value填充成一个正方形,返回填充后的图片以及填充的位置信息'''
def pad_to_square(img, pad_value):
c, h, w = img.shape
dim_diff = np.abs(h - w)
pad1, pad2 = dim_diff // 2, dim_diff - dim_diff // 2
pad = (0, 0, pad1, pad2) if h <= w else (pad1, pad2, 0, 0)
img = F.pad(img, pad, "constant", value=pad_value)
return img, pad
- resize
输入:image,原始图片; size,期望resize到的图片大小 输出:resize后的图片 功能:实现上/下采样的功能
'''图片调整大小:将正方形图片使用插值方法,改变到固定size大小'''
def resize(image, size):
image = F.interpolate(image.unsqueeze(0), size=size, mode="nearest").squeeze(0)
return image
pytorch torch.nn.functional.interpolate实现插值和上采样
torch.nn.functional.interpolate(input, size=None, scale_factor=None, mode='nearest', align_corners=None)
参数: input (Tensor) – 输入张量 size (int or Tuple[int] or Tuple[int, int] or Tuple[int, int, int]) – 输出大小 scale_factor (float or Tuple[float]) – 指定输出为输入的多少倍数。如果输入为tuple,其也要制定为tuple类型 mode (str) – 可使用的上采样算法,有’nearest’, ‘linear’, ‘bilinear’, ‘bicubic’ , ‘trilinear’和’area’. 默认使用’nearest’
- random_resize
输入:image 原始图片, min_size,max_size 随机数所在的范围 输出:调整后的图片 功能:随机调整图片的大小
"""随机裁剪函数:将图片随机裁剪到某个尺寸(使用插值法)"""
def random_resize(images, min_size=288, max_size=448):
new_size = random.sample(list(range(min_size, max_size + 1, 32)), 1)[0]
images = F.interpolate(images, size=new_size, mode="nearest")
return images
- ImageFolder
功能: 用来定义数据集的标准格式。 从文件夹中读取图片,将图片padding成正方形,所有的输入图片大小调整为416*416,返回图片的数量
'''数据集加载类1:加载并处理图片,返回的是图片路径,和经过处理后的图片'''
class ImageFolder(Dataset):
def __init__(self, folder_path, img_size=416):
self.files = sorted(glob.glob("%s/*.*" % folder_path))
self.img_size = img_size
def __getitem__(self, index):
img_path = self.files[index % len(self.files)]
img = transforms.ToTensor()(Image.open(img_path))
img, _ = pad_to_square(img, 0)
img = resize(img, self.img_size)
return img_path, img
def __len__(self):
return len(self.files)
Dataset类: pytorch读取图片,主要通过Dataset类。Dataset类作为所有datasets的基类,所有的datasets都要继承它 init: 用来初始化一些有关操作数据集的参数 getitem:定义数据获取的方式(包括读取数据,对数据进行变换等),该方法支持从 0 到 len(self)-1的索引。obj[index]等价于obj.getitem len:获取数据集的大小。len(obj)等价于obj.len()
"""数据集加载类2:加载并处理图片和图片标签,返回的是图片路径,经过处理后的图片,经过处理后的标签"""
class ListDataset(Dataset):
def __init__(self, list_path, img_size=416, augment=True, multiscale=True, normalized_labels=True):
with open(list_path, "r") as file:
self.img_files = file.readlines()
self.label_files = [
path.replace("images", "labels").replace(".png", ".txt").replace(".jpg", ".txt")
for path in self.img_files
]
self.img_size = img_size
self.max_objects = 100
self.augment = augment
self.multiscale = multiscale
self.normalized_labels = normalized_labels
self.min_size = self.img_size - 3 * 32
self.max_size = self.img_size + 3 * 32
self.batch_count = 0
def __getitem__(self, index):
img_path = self.img_files[index % len(self.img_files)].rstrip()
img_path = 'F:\\cv\\PyTorch-YOLOv3\\PyTorch-YOLOv3\\data\\coco' + img_path
img = transforms.ToTensor()(Image.open(img_path).convert('RGB'))
if len(img.shape) != 3:
img = img.unsqueeze(0)
img = img.expand((3, img.shape[1:]))
_, h, w = img.shape
h_factor, w_factor = (h, w) if self.normalized_labels else (1, 1)
img, pad = pad_to_square(img, 0)
_, padded_h, padded_w = img.shape
label_path = self.label_files[index % len(self.img_files)].rstrip()
label_path='F:\\cv\\PyTorch-YOLOv3\\PyTorch-YOLOv3\\data\\coco\\labels'+ label_path
targets = None
if os.path.exists(label_path):
boxes = torch.from_numpy(np.loadtxt(label_path).reshape(-1, 5))
x1 = w_factor * (boxes[:, 1] - boxes[:, 3] / 2)
y1 = h_factor * (boxes[:, 2] - boxes[:, 4] / 2)
x2 = w_factor * (boxes[:, 1] + boxes[:, 3] / 2)
y2 = h_factor * (boxes[:, 2] + boxes[:, 4] / 2)
x1 += pad[0]
y1 += pad[2]
x2 += pad[1]
y2 += pad[3]
boxes[:, 1] = ((x1 + x2) / 2) / padded_w
boxes[:, 2] = ((y1 + y2) / 2) / padded_h
boxes[:, 3] *= w_factor / padded_w
boxes[:, 4] *= h_factor / padded_h
targets = torch.zeros((len(boxes), 6))
targets[:, 1:] = boxes
if self.augment:
if np.random.random() < 0.5:
img, targets = horisontal_flip(img, targets)
return img_path, img, targets
def collate_fn(self, batch):
paths, imgs, targets = list(zip(*batch))
targets = [boxes for boxes in targets if boxes is not None]
for i, boxes in enumerate(targets):
boxes[:, 0] = i
targets = torch.cat(targets, 0)
if self.multiscale and self.batch_count % 10 == 0:
self.img_size = random.choice(range(self.min_size, self.max_size + 1, 32))
imgs = torch.stack([resize(img, self.img_size) for img in imgs])
self.batch_count += 1
return paths, imgs, targets
def __len__(self):
return len(self.img_files)
logger.py
用来将监控数据写入文件系统(日志),保存训练的某些信息。如损失等。这个logger类在train.py中使用,在训练过程中保存一些信息到日志文件。
import tensorflow as tf
class Logger(object):
def __init__(self, log_dir):
"""Create a summary writer logging to log_dir."""
self.writer = tf.summary.create_file_writer(log_dir)
def scalar_summary(self, tag, value, step):
with self.writer.as_default():
tf.summary.scalar(tag, value, step=step)
self.writer.flush()
def list_of_scalars_summary(self, tag_value_pairs, step):
with self.writer.as_default():
for tag, value in tag_value_pairs:
tf.summary.scalar(tag, value, step=step)
self.writer.flush()
parse_config.py
包含两个解析器: ??1.模型配置解析器:返回一个列表model_defs,列表的每一个元素为一个字典,字典代表模型某一个层(模块)的信息 。 ??2.数据配置解析器:返回一个字典,每一个键值对描述了,数据的名称路径,或其他信息。
'''模型配置解析器:解析yolo-v3层配置文件函数,并返回模块定义module_defs,path就是yolov3.cfg路径'''
def parse_model_config(path):
'''
看此函数,一定要先看config文件夹下的yolov3.cfg文件,如下是yolov3。cfg的一部分内容展示:
[convolutional]
batch_normalize=1
filters=32
size=3
stride=1
pad=1
activation=leaky
# Downsample
[convolutional]
batch_normalize=1
filters=64
size=3
stride=2
pad=1
activation=leaky
。。。
:param path: 模型配置文件路径,yolov3.cfg的路径
:return: 模型定义,列表类型,列表中的元素是字典,字典包含了每一个模块的定义参数
'''
file = open(path, 'r')
lines = file.read().split('\n')
lines = [x for x in lines if x and not x.startswith('#')]
lines = [x.rstrip().lstrip() for x in lines]
module_defs = []
for line in lines:
if line.startswith('['):
module_defs.append({})
module_defs[-1]['type'] = line[1:-1].rstrip()
if module_defs[-1]['type'] == 'convolutional':
module_defs[-1]['batch_normalize'] = 0
else:
key, value = line.split("=")
value = value.strip()
module_defs[-1][key.rstrip()] = value.strip()
return module_defs
'''数据配置解析器:参数path为配置文件的路径'''
def parse_data_config(path):
"""
数据配置包含的信息:
classes= 80
train=data/coco/trainvalno5k.txt
valid=data/coco/5k.txt
names=data/coco.names
backup=backup/
eval=coco
"""
options = dict()
options['gpus'] = '0,1,2,3'
options['num_workers'] = '10'
with open(path, 'r') as fp:
lines = fp.readlines()
for line in lines:
line = line.strip()
if line == '' or line.startswith('#'):
continue
key, value = line.split('=')
options[key.strip()] = value.strip()
return options
Utils.py
from __future__ import division
import tqdm
import torch
import numpy as np
def to_cpu(tensor):
return tensor.detach().cpu()
'''加载数据集类别信息:返回类别组成的列表'''
def load_classes(path):
fp = open(path, "r")
names = fp.read().split("\n")[:-1]
return names
'''权重初始化函数'''
def weights_init_normal(m):
classname = m.__class__.__name__
if classname.find("Conv") != -1:
torch.nn.init.normal_(m.weight.data, 0.0, 0.02)
elif classname.find("BatchNorm2d") != -1:
torch.nn.init.normal_(m.weight.data, 1.0, 0.02)
torch.nn.init.constant_(m.bias.data, 0.0)
'''改变预测边界框的尺寸函数:参数为,边界框、当前的图片尺寸(标量)、原始图片尺寸。因为网络预测的边界框信息是,
对图像填充、调整大小后的图片进行预测的结果,因此需要对预测的边界框进行调整使其适应于原图的目标'''
def rescale_boxes(boxes, current_dim, original_shape):
orig_h, orig_w = original_shape
pad_x = max(orig_h - orig_w, 0) * (current_dim / max(original_shape))
pad_y = max(orig_w - orig_h, 0) * (current_dim / max(original_shape))
unpad_h = current_dim - pad_y
unpad_w = current_dim - pad_x
boxes[:, 0] = ((boxes[:, 0] - pad_x // 2) / unpad_w) * orig_w
boxes[:, 1] = ((boxes[:, 1] - pad_y // 2) / unpad_h) * orig_h
boxes[:, 2] = ((boxes[:, 2] - pad_x // 2) / unpad_w) * orig_w
boxes[:, 3] = ((boxes[:, 3] - pad_y // 2) / unpad_h) * orig_h
return boxes
'''将边界框信息转换为左上右下坐标表示函数'''
def xywh2xyxy(x):
y = x.new(x.shape)
y[..., 0] = x[..., 0] - x[..., 2] / 2
y[..., 1] = x[..., 1] - x[..., 3] / 2
y[..., 2] = x[..., 0] + x[..., 2] / 2
y[..., 3] = x[..., 1] + x[..., 3] / 2
return y
"""度量计算:参数为true_positive(值为0或1,list)、预测置信度(list),预测类别(list),真实类别(list)
返回:p, r, ap, f1, unique_classes.astype("int32")"""
def ap_per_class(tp, conf, pred_cls, target_cls):
i = np.argsort(-conf)
tp, conf, pred_cls = tp[i], conf[i], pred_cls[i]
unique_classes = np.unique(target_cls)
ap, p, r = [], [], []
for c in tqdm.tqdm(unique_classes, desc="Computing AP"):
i = pred_cls == c
n_gt = (target_cls == c).sum()
n_p = i.sum()
if n_p == 0 and n_gt == 0:
continue
elif n_p == 0 or n_gt == 0:
ap.append(0)
r.append(0)
p.append(0)
else:
fpc = (1 - tp[i]).cumsum()
tpc = (tp[i]).cumsum()
recall_curve = tpc / (n_gt + 1e-16)
r.append(recall_curve[-1])
precision_curve = tpc / (tpc + fpc)
p.append(precision_curve[-1])
ap.append(compute_ap(recall_curve, precision_curve))
p, r, ap = np.array(p), np.array(r), np.array(ap)
f1 = 2 * p * r / (p + r + 1e-16)
return p, r, ap, f1, unique_classes.astype("int32")
"""计算AP"""
def compute_ap(recall, precision):
mrec = np.concatenate(([0.0], recall, [1.0]))
mpre = np.concatenate(([0.0], precision, [0.0]))
for i in range(mpre.size - 1, 0, -1):
mpre[i - 1] = np.maximum(mpre[i - 1], mpre[i])
i = np.where(mrec[1:] != mrec[:-1])[0]
ap = np.sum((mrec[i + 1] - mrec[i]) * mpre[i + 1])
return ap
'''统计信息计算:参数,模型预测输出(NMS处理后的结果),真实标签(适应于原图的x,y,x,y),iou阈值。
返回,true_positive(值为0/1,如果预测边界框与真实边界框重叠度大则值为1,否则为0),预测置信度,预测类别'''
def get_batch_statistics(outputs, targets, iou_threshold):
batch_metrics = []
for sample_i in range(len(outputs)):
if outputs[sample_i] is None:
continue
'''图片的预测信息:'''
output = outputs[sample_i]
pred_boxes = output[:, :4]
pred_scores = output[:, 4]
pred_labels = output[:, -1]
true_positives = np.zeros(pred_boxes.shape[0])
'''图片的标注信息(groundtruth):'''
annotations = targets[targets[:, 0] == sample_i][:, 1:]
target_labels = annotations[:, 0] if len(annotations) else []
if len(annotations):
detected_boxes = []
target_boxes = annotations[:, 1:]
for pred_i, (pred_box, pred_label) in enumerate(zip(pred_boxes, pred_labels)):
if len(detected_boxes) == len(annotations):
break
if pred_label not in target_labels:
continue
iou, box_index = bbox_iou(pred_box.unsqueeze(0), target_boxes).max(0)
if iou >= iou_threshold and box_index not in detected_boxes:
true_positives[pred_i] = 1
detected_boxes += [box_index]
batch_metrics.append([true_positives, pred_scores, pred_labels])
return batch_metrics
"""未用到"""
def bbox_wh_iou(wh1, wh2):
wh2 = wh2.t()
w1, h1 = wh1[0], wh1[1]
w2, h2 = wh2[0], wh2[1]
inter_area = torch.min(w1, w2) * torch.min(h1, h2)
union_area = (w1 * h1 + 1e-16) + w2 * h2 - inter_area
return inter_area / union_area
"""计算两个边界框的IOU值"""
def bbox_iou(box1, box2, x1y1x2y2=True):
if not x1y1x2y2:
b1_x1, b1_x2 = box1[:, 0] - box1[:, 2] / 2, box1[:, 0] + box1[:, 2] / 2
b1_y1, b1_y2 = box1[:, 1] - box1[:, 3] / 2, box1[:, 1] + box1[:, 3] / 2
b2_x1, b2_x2 = box2[:, 0] - box2[:, 2] / 2, box2[:, 0] + box2[:, 2] / 2
b2_y1, b2_y2 = box2[:, 1] - box2[:, 3] / 2, box2[:, 1] + box2[:, 3] / 2
else:
b1_x1, b1_y1, b1_x2, b1_y2 = box1[:, 0], box1[:, 1], box1[:, 2], box1[:, 3]
b2_x1, b2_y1, b2_x2, b2_y2 = box2[:, 0], box2[:, 1], box2[:, 2], box2[:, 3]
inter_rect_x1 = torch.max(b1_x1, b2_x1)
inter_rect_y1 = torch.max(b1_y1, b2_y1)
inter_rect_x2 = torch.min(b1_x2, b2_x2)
inter_rect_y2 = torch.min(b1_y2, b2_y2)
inter_area = torch.clamp(inter_rect_x2 - inter_rect_x1 + 1, min=0) * torch.clamp(
inter_rect_y2 - inter_rect_y1 + 1, min=0
)
b1_area = (b1_x2 - b1_x1 + 1) * (b1_y2 - b1_y1 + 1)
b2_area = (b2_x2 - b2_x1 + 1) * (b2_y2 - b2_y1 + 1)
iou = inter_area / (b1_area + b2_area - inter_area + 1e-16)
return iou
'''非极大值抑制函数:返回边界框【x1,y1,x2,y2,conf,class_conf,class_pred】,参数为,模型预测,置信度阈值,nms阈值'''
def non_max_suppression(prediction, conf_thres=0.5, nms_thres=0.4):
"""
Removes detections with lower object confidence score than 'conf_thres' and performs Non-Maximum Suppression to further filter detections.
Returns detections with shape:
(x1, y1, x2, y2, object_conf, class_score, class_pred)
"""
"""(1)模型预测坐标格式转变: (center x, center y, width, height) to (x1, y1, x2, y2)"""
prediction[..., :4] = xywh2xyxy(prediction[..., :4])
output = [None for _ in range(len(prediction))]
for image_i, image_pred in enumerate(prediction):
"""(2)边界框筛选:去除目标置信度低于阈值的边界框"""
image_pred = image_pred[image_pred[:, 4] >= conf_thres]
if not image_pred.size(0):
continue
"""(3)非极大值抑制:根据score进行排序得到最大值,找到和这个score最大的预测类别相同的计算iou值,通过加权计算,得到最终的预测框(xyxy),最后从prediction中去掉iou大于设置的iou阈值的边界框。"""
score = image_pred[:, 4] * image_pred[:, 5:].max(1)[0]
image_pred = image_pred[(-score).argsort()]
class_confs, class_preds = image_pred[:, 5:].max(1, keepdim=True)
detections = torch.cat((image_pred[:, :5], class_confs.float(), class_preds.float()), 1)
keep_boxes = []
while detections.size(0):
large_overlap = bbox_iou(detections[0, :4].unsqueeze(0), detections[:, :4]) > nms_thres
label_match = detections[0, -1] == detections[:, -1]
invalid = large_overlap & label_match
weights = detections[invalid, 4:5]
detections[0, :4] = (weights * detections[invalid, :4]).sum(0) / weights.sum()
keep_boxes += [detections[0]]
detections = detections[~invalid]
if keep_boxes:
output[image_i] = torch.stack(keep_boxes)
return output
def build_targets(pred_boxes, pred_cls, target, anchors, ignore_thres):
ByteTensor = torch.cuda.ByteTensor if pred_boxes.is_cuda else torch.ByteTensor
FloatTensor = torch.cuda.FloatTensor if pred_boxes.is_cuda else torch.FloatTensor
nB = pred_boxes.size(0)
nA = pred_boxes.size(1)
nC = pred_cls.size(-1)
nG = pred_boxes.size(2)
obj_mask = ByteTensor(nB, nA, nG, nG).fill_(0)
noobj_mask = ByteTensor(nB, nA, nG, nG).fill_(1)
class_mask = FloatTensor(nB, nA, nG, nG).fill_(0)
iou_scores = FloatTensor(nB, nA, nG, nG).fill_(0)
tx = FloatTensor(nB, nA, nG, nG).fill_(0)
ty = FloatTensor(nB, nA, nG, nG).fill_(0)
tw = FloatTensor(nB, nA, nG, nG).fill_(0)
th = FloatTensor(nB, nA, nG, nG).fill_(0)
tcls = FloatTensor(nB, nA, nG, nG, nC).fill_(0)
target_boxes = target[:, 2:6] * nG
gxy = target_boxes[:, :2]
gwh = target_boxes[:, 2:]
ious = torch.stack([bbox_wh_iou(anchor, gwh) for anchor in anchors])
best_ious, best_n = ious.max(0)
b, target_labels = target[:, :2].long().t()
gx, gy = gxy.t()
gw, gh = gwh.t()
gi, gj = gxy.long().t()
obj_mask[b, best_n, gj, gi] = 1
noobj_mask[b, best_n, gj, gi] = 0
for i, anchor_ious in enumerate(ious.t()):
noobj_mask[b[i], anchor_ious > ignore_thres, gj[i], gi[i]] = 0
tx[b, best_n, gj, gi] = gx - gx.floor()
ty[b, best_n, gj, gi] = gy - gy.floor()
tw[b, best_n, gj, gi] = torch.log(gw / anchors[best_n][:, 0] + 1e-16)
th[b, best_n, gj, gi] = torch.log(gh / anchors[best_n][:, 1] + 1e-16)
tcls[b, best_n, gj, gi, target_labels] = 1
class_mask[b, best_n, gj, gi] = (pred_cls[b, best_n, gj, gi].argmax(-1) == target_labels).float()
iou_scores[b, best_n, gj, gi] = bbox_iou(pred_boxes[b, best_n, gj, gi], target_boxes, x1y1x2y2=False)
tconf = obj_mask.float()
return iou_scores, class_mask, obj_mask, noobj_mask, tx, ty, tw, th, tcls, tconf
detect.py
模型训练完成后,进行检测测试的文件。验证数据集在data/samples文件夹下,验证结果保存在本py文件自动创建的文件夹output文件夹下。
from __future__ import division
from models import *
from utils.utils import *
from utils.datasets import *
import os
import time
import datetime
import argparse
from PIL import Image
import torch
from torch.utils.data import DataLoader
from torch.autograd import Variable
import matplotlib.pyplot as plt
import matplotlib.patches as patches
from matplotlib.ticker import NullLocator
if __name__ == "__main__":
'''(1)参数解析'''
parser = argparse.ArgumentParser()
parser.add_argument("--image_folder", type=str, default="data/samples", help="path to dataset")
parser.add_argument("--model_def", type=str, default="config/yolov3.cfg", help="path to model definition file")
parser.add_argument("--weights_path", type=str, default="weights/yolov3.weights", help="path to weights file")
parser.add_argument("--class_path", type=str, default="data/coco.names", help="path to class label file")
parser.add_argument("--conf_thres", type=float, default=0.8, help="object confidence threshold")
parser.add_argument("--nms_thres", type=float, default=0.4, help="iou thresshold for non-maximum suppression")
parser.add_argument("--batch_size", type=int, default=1, help="size of the batches")
parser.add_argument("--n_cpu", type=int, default=0, help="number of cpu threads to use during batch generation")
parser.add_argument("--img_size", type=int, default=416, help="size of each image dimension")
parser.add_argument("--checkpoint_model", type=str, help="path to checkpoint model")
opt = parser.parse_args()
print(opt)
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
os.makedirs("output", exist_ok=True)
'''(2)模型构建'''
model = Darknet(opt.model_def, img_size=opt.img_size).to(device)
if opt.weights_path.endswith(".weights"):
model.load_darknet_weights(opt.weights_path)
else:
model.load_state_dict(torch.load(opt.weights_path))
model.eval()
'''(3)数据集加载、类别加载'''
dataloader = DataLoader(
ImageFolder(opt.image_folder, img_size=opt.img_size),
batch_size=opt.batch_size,
shuffle=False,
num_workers=opt.n_cpu,
)
classes = load_classes(opt.class_path)
imgs = []
img_detections = []
"""(3)模型预测:将图片路径、图片预测结果存入imgs和img_detections列表中"""
print("\nPerforming object detection:")
prev_time = time.time()
Tensor = torch.cuda.FloatTensor if torch.cuda.is_available() else torch.FloatTensor
for batch_i, (img_paths, input_imgs) in enumerate(dataloader):
input_imgs = Variable(input_imgs.type(Tensor))
with torch.no_grad():
detections = model(input_imgs)
detections = non_max_suppression(detections, opt.conf_thres, opt.nms_thres)
current_time = time.time()
inference_time = datetime.timedelta(seconds=current_time - prev_time)
prev_time = current_time
print("\t+ Batch %d, Inference Time: %s" % (batch_i, inference_time))
imgs.extend(img_paths)
img_detections.extend(detections)
"""(4)将检测结果绘制到图片,并保存"""
cmap = plt.get_cmap("tab20b")
colors = [cmap(i) for i in np.linspace(0, 1, 20)]
for img_i, (path, detections) in enumerate(zip(imgs, img_detections)):
print("(%d) Image: '%s'" % (img_i, path))
img = np.array(Image.open(path))
plt.figure()
fig, ax = plt.subplots(1)
ax.imshow(img)
if detections is not None:
detections = rescale_boxes(detections, opt.img_size, img.shape[:2])
unique_labels = detections[:, -1].cpu().unique()
n_cls_preds = len(unique_labels)
bbox_colors = random.sample(colors, n_cls_preds)
for x1, y1, x2, y2, conf, cls_conf, cls_pred in detections:
print("\t+ Label: %s, Conf: %.5f" % (classes[int(cls_pred)], cls_conf.item()))
box_w = x2 - x1
box_h = y2 - y1
color = bbox_colors[int(np.where(unique_labels == int(cls_pred))[0])]
bbox = patches.Rectangle((x1, y1), box_w, box_h, linewidth=2, edgecolor=color, facecolor="none")
ax.add_patch(bbox)
plt.text( x1,y1,s=classes[int(cls_pred)],color="white",verticalalignment="top",bbox={"color": color, "pad": 0} )
plt.axis("off")
plt.gca().xaxis.set_major_locator(NullLocator())
plt.gca().yaxis.set_major_locator(NullLocator())
filename = path.split("/")[-1].split(".")[0]
plt.savefig(f"output/{filename}.png", bbox_inches="tight", pad_inches=0.0)
plt.close()
models.py
定义模型结构的文件,根据模型的配置文件信息,来构建模型结构
from __future__ import division
import torch
import torch.nn as nn
import torch.nn.functional as F
import numpy as np
from utils.parse_config import *
from utils.utils import build_targets, to_cpu
'''构建网络函数:通过获取的模型定义module_defs来构建YOLOv3模型结构,根据module_defs中的模块配置构造层块的模块列表'''
def create_modules(module_defs):
'''构建模型结构'''
'''(1)解析模型超参数,获取模型的输入通道数'''
hyperparams = module_defs.pop(0)
output_filters = [int(hyperparams["channels"])]
'''(2)构建nn.ModuleList(),用来存放创建的网络层、模块'''
module_list = nn.ModuleList()
'''(3)遍历模型定义列表的每个字典元素,创建相应的层、模块,添加到nn.ModuleList()中'''
for module_i, module_def in enumerate(module_defs):
modules = nn.Sequential()
if module_def["type"] == "convolutional":
bn = int(module_def["batch_normalize"])
filters = int(module_def["filters"])
kernel_size = int(module_def["size"])
pad = (kernel_size - 1) // 2
modules.add_module(f"conv_{module_i}",
nn.Conv2d(
in_channels=output_filters[-1],
out_channels=filters,
kernel_size=kernel_size,
stride=int(module_def["stride"]),
padding=pad,
bias=not bn,
),
)
if bn:
modules.add_module(f"batch_norm_{module_i}", nn.BatchNorm2d(filters, momentum=0.9, eps=1e-5))
if module_def["activation"] == "leaky":
modules.add_module(f"leaky_{module_i}", nn.LeakyReLU(0.1))
elif module_def["type"] == "maxpool":
kernel_size = int(module_def["size"])
stride = int(module_def["stride"])
if kernel_size == 2 and stride == 1:
modules.add_module(f"_debug_padding_{module_i}", nn.ZeroPad2d((0, 1, 0, 1)))
modules.add_module(f"maxpool_{module_i}",
nn.MaxPool2d(
kernel_size=kernel_size,
stride=stride,
padding=int((kernel_size - 1) // 2))
)
elif module_def["type"] == "upsample":
upsample = Upsample(scale_factor=int(module_def["stride"]), mode="nearest")
modules.add_module(f"upsample_{module_i}", upsample)
elif module_def["type"] == "route":
layers = [int(x) for x in module_def["layers"].split(",")]
filters = sum([output_filters[1:][i] for i in layers])
modules.add_module(f"route_{module_i}", EmptyLayer())
elif module_def["type"] == "shortcut":
filters = output_filters[1:][int(module_def["from"])]
modules.add_module(f"shortcut_{module_i}", EmptyLayer())
elif module_def["type"] == "yolo":
anchor_idxs = [int(x) for x in module_def["mask"].split(",")]
anchors = [int(x) for x in module_def["anchors"].split(",")]
anchors = [(anchors[i], anchors[i + 1]) for i in range(0, len(anchors), 2)]
anchors = [anchors[i] for i in anchor_idxs]
num_classes = int(module_def["classes"])
img_size = int(hyperparams["height"])
yolo_layer = YOLOLayer(anchors, num_classes, img_size)
modules.add_module(f"yolo_{module_i}", yolo_layer)
module_list.append(modules)
output_filters.append(filters)
return hyperparams, module_list
'''上采样层'''
class Upsample(nn.Module):
""" nn.Upsample 被重写 """
def __init__(self, scale_factor, mode="nearest"):
super(Upsample, self).__init__()
self.scale_factor = scale_factor
self.mode = mode
def forward(self, x):
x = F.interpolate(x, scale_factor=self.scale_factor, mode=self.mode)
return x
'''emptylayer定义'''
class EmptyLayer(nn.Module):
"""Placeholder for 'route' and 'shortcut' layers"""
def __init__(self):
super(EmptyLayer, self).__init__()
'''yolo层定义:检测层'''
class YOLOLayer(nn.Module):
"""Detection layer"""
def __init__(self, anchors, num_classes, img_dim=416):
super(YOLOLayer, self).__init__()
self.anchors = anchors
self.num_anchors = len(anchors)
self.num_classes = num_classes
self.ignore_thres = 0.5
self.mse_loss = nn.MSELoss()
self.bce_loss = nn.BCELoss()
self.obj_scale = 1
self.noobj_scale = 100
self.metrics = {}
self.img_dim = img_dim
self.grid_size = 0
def compute_grid_offsets(self, grid_size, cuda=True):
self.grid_size = grid_size
g = self.grid_size
FloatTensor = torch.cuda.FloatTensor if cuda else torch.FloatTensor
self.stride = self.img_dim / self.grid_size
self.grid_x = torch.arange(g).repeat(g, 1).view([1, 1, g, g]).type(FloatTensor)
self.grid_y = torch.arange(g).repeat(g, 1).t().view([1, 1, g, g]).type(FloatTensor)
self.scaled_anchors = FloatTensor([(a_w / self.stride, a_h / self.stride) for a_w, a_h in self.anchors])
self.anchor_w = self.scaled_anchors[:, 0:1].view((1, self.num_anchors, 1, 1))
self.anchor_h = self.scaled_anchors[:, 1:2].view((1, self.num_anchors, 1, 1))
def forward(self, x, targets=None, img_dim=None):
FloatTensor = torch.cuda.FloatTensor if x.is_cuda else torch.FloatTensor
self.img_dim = img_dim
num_samples = x.size(0)
grid_size = x.size(2)
prediction = (
x.view(num_samples, self.num_anchors, self.num_classes + 5, grid_size, grid_size)
.permute(0, 1, 3, 4, 2)
.contiguous()
)
x = torch.sigmoid(prediction[..., 0])
y = torch.sigmoid(prediction[..., 1])
w = prediction[..., 2]
h = prediction[..., 3]
pred_conf = torch.sigmoid(prediction[..., 4])
pred_cls = torch.sigmoid(prediction[..., 5:])
if grid_size != self.grid_size:
self.compute_grid_offsets(grid_size, cuda=x.is_cuda)
pred_boxes = FloatTensor(prediction[..., :4].shape)
pred_boxes[..., 0] = x.data + self.grid_x
pred_boxes[..., 1] = y.data + self.grid_y
pred_boxes[..., 2] = torch.exp(w.data) * self.anchor_w
pred_boxes[..., 3] = torch.exp(h.data) * self.anchor_h
output = torch.cat(
(
pred_boxes.view(num_samples, -1, 4) * self.stride,
pred_conf.view(num_samples, -1, 1),
pred_cls.view(num_samples, -1, self.num_classes),
),
-1,
)
if targets is None:
return output, 0
else:
iou_scores, class_mask, obj_mask, noobj_mask, tx, ty, tw, th, tcls, tconf = build_targets(
pred_boxes=pred_boxes,
pred_cls=pred_cls,
target=targets,
anchors=self.scaled_anchors,
ignore_thres=self.ignore_thres,
)
loss_x = self.mse_loss(x[obj_mask], tx[obj_mask])
loss_y = self.mse_loss(y[obj_mask], ty[obj_mask])
loss_w = self.mse_loss(w[obj_mask], tw[obj_mask])
loss_h = self.mse_loss(h[obj_mask], th[obj_mask])
loss_conf_obj = self.bce_loss(pred_conf[obj_mask], tconf[obj_mask])
loss_conf_noobj = self.bce_loss(pred_conf[noobj_mask], tconf[noobj_mask])
loss_conf = self.obj_scale * loss_conf_obj + self.noobj_scale * loss_conf_noobj
loss_cls = self.bce_loss(pred_cls[obj_mask], tcls[obj_mask])
total_loss = loss_x + loss_y + loss_w + loss_h + loss_conf + loss_cls
cls_acc = 100 * class_mask[obj_mask].mean()
conf_obj = pred_conf[obj_mask].mean()
conf_noobj = pred_conf[noobj_mask].mean()
conf50 = (pred_conf > 0.5).float()
iou50 = (iou_scores > 0.5).float()
iou75 = (iou_scores > 0.75).float()
detected_mask = conf50 * class_mask * tconf
precision = torch.sum(iou50 * detected_mask) / (conf50.sum() + 1e-16)
recall50 = torch.sum(iou50 * detected_mask) / (obj_mask.sum() + 1e-16)
recall75 = torch.sum(iou75 * detected_mask) / (obj_mask.sum() + 1e-16)
self.metrics = {
"loss": to_cpu(total_loss).item(),
"x": to_cpu(loss_x).item(),
"y": to_cpu(loss_y).item(),
"w": to_cpu(loss_w).item(),
"h": to_cpu(loss_h).item(),
"conf": to_cpu(loss_conf).item(),
"cls": to_cpu(loss_cls).item(),
"cls_acc": to_cpu(cls_acc).item(),
"recall50": to_cpu(recall50).item(),
"recall75": to_cpu(recall75).item(),
"precision": to_cpu(precision).item(),
"conf_obj": to_cpu(conf_obj).item(),
"conf_noobj": to_cpu(conf_noobj).item(),
"grid_size": grid_size,
}
return output, total_loss
"""Darknet类:YOLOv3模型"""
class Darknet(nn.Module):
"""YOLOv3 object detection model"""
def __init__(self, config_path, img_size=416):
super(Darknet, self).__init__()
self.module_defs = parse_model_config(config_path)
self.hyperparams,self.module_list = create_modules(self.module_defs)
self.yolo_layers = [layer[0] for layer in self.module_list if hasattr(layer[0], "metrics")]
self.img_size = img_size
self.seen = 0
self.header_info = np.array([0, 0, 0, self.seen, 0], dtype=np.int32)
def forward(self, x, targets=None):
img_dim = x.shape[2]
loss = 0
layer_outputs, yolo_outputs = [], []
for i, (module_def, module) in enumerate(zip(self.module_defs, self.module_list)):
if module_def["type"] in ["convolutional", "upsample", "maxpool"]:
x = module(x)
elif module_def["type"] == "route":
x = torch.cat([layer_outputs[int(layer_i)] for layer_i in module_def["layers"].split(",")], 1)
elif module_def["type"] == "shortcut":
layer_i = int(module_def["from"])
x = layer_outputs[-1] + layer_outputs[layer_i]
elif module_def["type"] == "yolo":
x, layer_loss = module[0](x, targets, img_dim)
loss += layer_loss
yolo_outputs.append(x)
layer_outputs.append(x)
yolo_outputs = to_cpu(torch.cat(yolo_outputs, 1))
return yolo_outputs if targets is None else (loss, yolo_outputs)
def load_darknet_weights(self, weights_path):
"""Parses and loads the weights stored in 'weights_path'"""
with open(weights_path, "rb") as f:
header = np.fromfile(f, dtype=np.int32, count=5)
self.header_info = header
self.seen = header[3]
weights = np.fromfile(f, dtype=np.float32)
cutoff = None
if "darknet53.conv.74" in weights_path:
cutoff = 75
ptr = 0
for i, (module_def, module) in enumerate(zip(self.module_defs, self.module_list)):
if i == cutoff:
break
if module_def["type"] == "convolutional":
conv_layer = module[0]
if module_def["batch_normalize"]:
bn_layer = module[1]
num_b = bn_layer.bias.numel()
bn_b = torch.from_numpy(weights[ptr : ptr + num_b]).view_as(bn_layer.bias)
bn_layer.bias.data.copy_(bn_b)
ptr += num_b
bn_w = torch.from_numpy(weights[ptr : ptr + num_b]).view_as(bn_layer.weight)
bn_layer.weight.data.copy_(bn_w)
ptr += num_b
bn_rm = torch.from_numpy(weights[ptr : ptr + num_b]).view_as(bn_layer.running_mean)
bn_layer.running_mean.data.copy_(bn_rm)
ptr += num_b
bn_rv = torch.from_numpy(weights[ptr : ptr + num_b]).view_as(bn_layer.running_var)
bn_layer.running_var.data.copy_(bn_rv)
ptr += num_b
else:
num_b = conv_layer.bias.numel()
conv_b = torch.from_numpy(weights[ptr : ptr + num_b]).view_as(conv_layer.bias)
conv_layer.bias.data.copy_(conv_b)
ptr += num_b
num_w = conv_layer.weight.numel()
conv_w = torch.from_numpy(weights[ptr : ptr + num_w]).view_as(conv_layer.weight)
conv_layer.weight.data.copy_(conv_w)
ptr += num_w
def save_darknet_weights(self, path, cutoff=-1):
"""
@:param path - path of the new weights file
@:param cutoff - save layers between 0 and cutoff (cutoff = -1 -> all are saved)
"""
fp = open(path, "wb")
self.header_info[3] = self.seen
self.header_info.tofile(fp)
for i, (module_def, module) in enumerate(zip(self.module_defs[:cutoff], self.module_list[:cutoff])):
if module_def["type"] == "convolutional":
conv_layer = module[0]
if module_def["batch_normalize"]:
bn_layer = module[1]
bn_layer.bias.data.cpu().numpy().tofile(fp)
bn_layer.weight.data.cpu().numpy().tofile(fp)
bn_layer.running_mean.data.cpu().numpy().tofile(fp)
bn_layer.running_var.data.cpu().numpy().tofile(fp)
else:
conv_layer.bias.data.cpu().numpy().tofile(fp)
conv_layer.weight.data.cpu().numpy().tofile(fp)
fp.close()
test.py
用来评估模型性能的文件。
from __future__ import division
from models import *
from utils.utils import *
from utils.datasets import *
from utils.parse_config import *
import argparse
import tqdm
import torch
from torch.utils.data import DataLoader
from torch.autograd import Variable
"""模型评估函数:参数为模型、valid数据集路径、iou阈值。nms阈值、网络输入大小、批量大小"""
def evaluate(model, path, iou_thres, conf_thres, nms_thres, img_size, batch_size):
model.eval()
'''(1)获取评估数据集:变为batch组成的数据集'''
dataset = ListDataset(path, img_size=img_size, augment=False, multiscale=False)
dataloader = torch.utils.data.DataLoader(dataset,
batch_size=batch_size,
shuffle=False,
num_workers=1,
collate_fn=dataset.collate_fn)
Tensor = torch.cuda.FloatTensor if torch.cuda.is_available() else torch.FloatTensor
labels = []
sample_metrics = []
for batch_i, (_, imgs, targets) in enumerate(tqdm.tqdm(dataloader, desc="Detecting objects")):
'''(2) batch标签处理'''
labels += targets[:, 1].tolist()
targets[:, 2:] = xywh2xyxy(targets[:, 2:])
targets[:, 2:] *= img_size
'''(3)batch图片预测,并进行NMS处理'''
imgs = Variable(imgs.type(Tensor), requires_grad=False)
with torch.no_grad():
outputs = model(imgs)
outputs = non_max_suppression(outputs, conf_thres=conf_thres, nms_thres=nms_thres)
'''(4)预测信息统计:得到经过NMS处理后,预测边界框的true_positive(值为或1)、预测置信度,预测类别信息'''
sample_metrics += get_batch_statistics(outputs, targets, iou_threshold=iou_thres)
if len(sample_metrics) == 0:
return np.array([]), np.array([]), np.array([]), np.array([]), np.array([])
true_positives, pred_scores, pred_labels = [np.concatenate(x, 0) for x in list(zip(*sample_metrics))]
precision, recall, AP, f1, ap_class = ap_per_class(true_positives, pred_scores, pred_labels, labels)
return precision, recall, AP, f1, ap_class
if __name__ == "__main__":
'''(1)参数解析'''
parser = argparse.ArgumentParser()
parser.add_argument("--batch_size", type=int, default=8, help="size of each image batch")
parser.add_argument("--model_def", type=str, default="config/yolov3.cfg", help="path to model definition file")
parser.add_argument("--data_config", type=str, default="config/custom.data", help="path to data config file")
parser.add_argument("--weights_path", type=str, default="checkpoints/yolov3_ckpt_9.pth", help="path to weights file")
parser.add_argument("--class_path", type=str, default="data/coco.names", help="path to class label file")
parser.add_argument("--iou_thres", type=float, default=0.5, help="iou threshold required to qualify as detected")
parser.add_argument("--conf_thres", type=float, default=0.001, help="object confidence threshold")
parser.add_argument("--nms_thres", type=float, default=0.5, help="iou thresshold for non-maximum suppression")
parser.add_argument("--n_cpu", type=int, default=8, help="number of cpu threads to use during batch generation")
parser.add_argument("--img_size", type=int, default=416, help="size of each image dimension")
opt = parser.parse_args()
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
"""(2)数据解析"""
data_config = parse_data_config(opt.data_config)
valid_path = data_config["valid"]
class_names = load_classes(data_config["names"])
"""(3)模型构建:构建模型,加载模型参数"""
model = Darknet(opt.model_def).to(device)
if opt.weights_path.endswith(".weights"):
model.load_darknet_weights(opt.weights_path)
else:
model.load_state_dict(torch.load(opt.weights_path))
print("Compute mAP...")
"""(4)模型评估"""
precision, recall, AP, f1, ap_class = evaluate(
model,
path=valid_path,
iou_thres=opt.iou_thres,
conf_thres=opt.conf_thres,
nms_thres=opt.nms_thres,
img_size=opt.img_size,
batch_size=8,
)
print(precision, recall, AP, f1, ap_class)
print("Average Precisions:")
for i, c in enumerate(ap_class):
print(f"+ Class '{c}' ({class_names[c]}) - AP: {AP[i]}")
print(f"mAP: {AP.mean()}")
train.py 模型训练的文件夹,训练会生成: ??(1)checkpoint文件夹,用来保存某epoch训练后的模型参数 ??(2)logs文件夹,用来保存日志信息
from __future__ import division
from models import *
from utils.logger import *
from utils.utils import *
from utils.datasets import *
from utils.parse_config import *
from terminaltables import AsciiTable
import os
from test import evaluate
import time
import datetime
import argparse
import torch
from torch.utils.data import DataLoader
from torch.autograd import Variable
if __name__ == "__main__":
'''(1)参数解析'''
parser = argparse.ArgumentParser()
parser.add_argument("--epochs", type=int, default=10, help="number of epochs")
parser.add_argument("--batch_size", type=int, default=1, help="size of each image batch")
parser.add_argument("--gradient_accumulations", type=int, default=2, help="number of gradient accums before step")
parser.add_argument("--model_def", type=str, default="config/yolov3.cfg", help="path to model definition file")
parser.add_argument("--data_config", type=str, default="config/custom.data", help="path to data config file")
parser.add_argument("--pretrained_weights", type=str, help="if specified starts from checkpoint model")
parser.add_argument("--n_cpu", type=int, default=1, help="number of cpu threads to use during batch generation")
parser.add_argument("--img_size", type=int, default=416, help="size of each image dimension")
parser.add_argument("--checkpoint_interval", type=int, default=1, help="interval between saving model weights")
parser.add_argument("--evaluation_interval", type=int, default=1, help="interval evaluations on validation set")
parser.add_argument("--compute_map", default=False, help="if True computes mAP every tenth batch")
parser.add_argument("--multiscale_training", default=True, help="allow for multi-scale training")
parser.add_argument("--weights_path", type=str, default="checkpoints/yolov3_ckpt_9.pth", help="path to weights file")
opt = parser.parse_args()
print(opt)
'''(2)实例化日志类'''
logger = Logger("logs")
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
'''(3)文件夹创建'''
os.makedirs("output", exist_ok=True)
os.makedirs("checkpoints", exist_ok=True)
"""(4)初始化模型:模型构建,模型参数装载"""
model = Darknet(opt.model_def).to(device)
model.apply(weights_init_normal)
if opt.pretrained_weights:
if opt.pretrained_weights.endswith(".pth"):
model.load_state_dict(torch.load(opt.pretrained_weights))
else:
model.load_darknet_weights(opt.pretrained_weights)
"""(5)数据集加载"""
data_config = parse_data_config(opt.data_config)
train_path = data_config["train"]
valid_path = data_config["valid"]
class_names = load_classes(data_config["names"])
dataset = ListDataset(train_path, augment=True, multiscale=opt.multiscale_training)
dataloader = torch.utils.data.DataLoader(
dataset,
batch_size=opt.batch_size,
shuffle=True,
num_workers=opt.n_cpu,
pin_memory=True,
collate_fn=dataset.collate_fn,
)
"""(7)优化器"""
optimizer = torch.optim.Adam(model.parameters())
"""(8)模型训练"""
metrics = [
"grid_size",
"loss",
"x",
"y",
"w",
"h",
"conf",
"cls",
"cls_acc",
"recall50",
"recall75",
"precision",
"conf_obj",
"conf_noobj",
]
for epoch in range(opt.epochs):
model.train()
start_time = time .time()
print('start_time',start_time)
for batch_i, (_, imgs, targets) in enumerate(dataloader):
batches_done = len(dataloader) * epoch + batch_i
imgs = Variable(imgs.to(device))
targets = Variable(targets.to(device), requires_grad=False)
loss, outputs = model(imgs, targets)
loss.backward()
if batches_done % opt.gradient_accumulations:
optimizer.step()
optimizer.zero_grad()
log_str = "\n---- [Epoch %d/%d, Batch %d/%d] ----\n" % (epoch+1, opt.epochs, batch_i+1, len(dataloader))
metric_table = [["Metrics", *[f"YOLO Layer {i}" for i in range(len(model.yolo_layers))]]]
for i, metric in enumerate(metrics):
formats = {m: "%.6f" for m in metrics}
formats["grid_size"] = "%2d"
formats["cls_acc"] = "%.2f%%"
row_metrics = [formats[metric] % yolo.metrics.get(metric, 0) for yolo in model.yolo_layers]
metric_table += [[metric, *row_metrics]]
tensorboard_log = []
for j, yolo in enumerate(model.yolo_layers):
for name, metric in yolo.metrics.items():
if name != "grid_size":
tensorboard_log += [(f"{name}_{j+1}", metric)]
tensorboard_log += [("loss", loss.item())]
logger.list_of_scalars_summary(tensorboard_log, batches_done)
log_str += AsciiTable(metric_table).table
log_str += f"\nTotal loss {loss.item()}"
epoch_batches_left = len(dataloader) - (batch_i + 1)
time_left = datetime.timedelta(seconds=epoch_batches_left * (time.time() - start_time) / (batch_i + 1))
log_str += f"\n---- ETA {time_left}"
print(log_str)
model.seen += imgs.size(0)
'''(9)训练时评估'''
if epoch % opt.evaluation_interval == 0:
print("\n---- Evaluating Model ----")
precision, recall, AP, f1, ap_class = evaluate(
model,
path=valid_path,
iou_thres=0.5,
conf_thres=0.5,
nms_thres=0.5,
img_size=opt.img_size,
batch_size=8,
)
evaluation_metrics = [
("val_precision", precision.mean()),
("val_recall", recall.mean()),
("val_mAP", AP.mean()),
("val_f1", f1.mean()),
]
logger.list_of_scalars_summary(evaluation_metrics, epoch)
ap_table = [["Index", "Class name", "AP"]]
for i, c in enumerate(ap_class):
ap_table += [[c, class_names[c], "%.5f" % AP[i]]]
print(AsciiTable(ap_table).table)
print(f"---- mAP {AP.mean()}")
'''(10)模型保存'''
if epoch % opt.checkpoint_interval == 0:
torch.save(model.state_dict(), f"checkpoints/yolov3_ckpt_%d.pth" % epoch)
YOLOv4算法解读
YOLOv4的改进
- 马赛克数据增强
简介的增加了batchsize
- 标签平滑
soft-nms
- SPPNet
V3中为了更好满足不同输入大小,训练的时候要改变输入数据的大小 SPP其实就是用最大池化来满足最终输入特征一致即可
每一个block按照特征图的channel维度拆分成两部分 一份正常走网络,另一份直接concat到这个block的输出
- CBAM
一个是通道上的注意力机制(Channel Attention),加大部分特征图的权重 一个是空间上的注意力机制 (Spatial Attention,加大部分区域的权重
YOLOV4中只采用了空间上的注意力机制
|