yolov5的loss.py中的build_targets函数中有两处扩充正样本的地方:
- 因为anchor有3个,所以将targets扩充成3份,每一份共享一个anchor;假设一共有20个targets目标框,则将目标数扩充至[3, 20],共60个目标;第一份的20个目标与第一个anchor匹配,第二份的20个目标与第二个anchor匹配,第三份的20个目标与第三个anchor匹配,那么会有一部分目标没有匹配上(目标框与anchor的宽比或高比超出阈值),则60个targets里可能只有30个targets匹配成功,剩余的targets过滤掉;
因此,可以看到原先的20个正样本被扩充到30个,起到了扩充正样本的作用;当然如果阈值(anchor_t)卡的太严,也可能会有大量的目标框被过滤掉; - yolov5考虑到下采样的过程中可能导致中心点偏移误差,因此根据targets的偏移量选择邻域的2个网格(4邻域中选2个)也作为正样本,这个操作可将正样本扩充到原来的3倍,即第一步的30个目标又被扩充至90个;
可以看到,经过两处操作,最初的20个目标被扩充至90个,缓解了正负样本不均衡的问题;
yolov5的代码晦涩难懂,看了好久才了解其中思路,用自己的代码复现了一遍:
def build_targets(self, preds, targets):
'''
:param preds: list(Tensor[b, 3, h, w, 85],...)
:param targets: Tensor[N, 6] img_indx, cls, x, y, w, h
:return:
tcls list(Tensor[N1], Tensor[N2], Tensor[N3]) 对应三个输出层,每层的targets的类别
tbox list(Tensor[N1, 4],Tensor[N2, 4],Tensor [N3, 4]) 三个输出层,每层的targets目标框的尺寸(x, y, w, h)
indices list(tuple(Tensor[N1], Tensor[N1],Tensor [N1],Tensor [N1]),
tuple(Tensor[N2], Tensor[N2],Tensor [N2],Tensor [N2]),
tuple(Tensor[N3], Tensor[N3],Tensor [N3],Tensor [N3])) 三个输出层,每层的targets目标框的信息(b, a, gj, gi)
anch list(Tensor[N1, 2],Tensor [N2, 2], Tensor[N3, 2]) 三个输出层,每层的targets目标框对应的anchor
'''
nt, na = targets.shape[0], self.anchors.shape[1]
tcls, tbox, indices, anch = [], [], [], []
device = targets.device
targets = targets.repeat(3, 1, 1)
anchor_idx = torch.arange(3, device=device).view(3, -1).repeat(1, nt)
targets = torch.cat((targets, anchor_idx[..., None]), 2)
gain = torch.ones(7, device=device)
for i, pred in enumerate(preds):
h, w = pred.shape[2], pred.shape[3]
anchor = self.anchors[i]
if nt:
'''为每个target匹配合适的anchor'''
gain[2:6] = torch.tensor([w, h, w, h], device=device)
t_pixel = targets * gain
ratio = t_pixel[..., 4:6] / anchor[:, None]
j = torch.max(ratio, 1/ratio).max(2)[0] < self.hyp['anchor_t']
t_pixel = t_pixel[j]
'''为每个target扩增正样本'''
g = 0.5
gxy = t_pixel[..., 2:4]
gxy_t = torch.tensor([w, h], device=device) - gxy
i, j = ((gxy % 1 < g) & (gxy > 1)).T
l, k = ((gxy_t % 1 < g) & (gxy_t > 1)).T
t = torch.cat((t_pixel, t_pixel[i], t_pixel[j], t_pixel[l], t_pixel[k]), dim=0)
t_left = t_pixel[i][..., 2:4] + torch.tensor([-1, 0], device=device)
t_right = t_pixel[l][..., 2:4] + torch.tensor([1, 0], device=device)
t_up = t_pixel[j][..., 2:4] + torch.tensor([0, -1], device=device)
t_down = t_pixel[k][..., 2:4] + torch.tensor([0, 1], device=device)
tij = torch.cat((gxy, t_left, t_up, t_right, t_down), dim=0).long()
else:
t = targets[0]
tij = t[:, 2:4].long()
ai = t[:, 6].long()
gwh = t[:, 4:6]
gxy_offset = t[:, 2:4] - tij
tcls.append(t[:, 1].long())
tbox.append(torch.cat((gxy_offset, gwh), 1))
anch.append(anchor[ai])
indices.append((t[:, 0].long(), t[:, 6].long(), tij[:, 1].long().clamp(0, h-1), tij[:, 0].long().clamp(0, w-1)))
return tcls, tbox, indices, anch
|