开发: C++知识库 Java知识库 JavaScript Python PHP知识库人工智能区块链大数据移动开发嵌入式开发工具数据结构与算法开发测试游戏开发网络协议系统运维
教程: HTML教程 CSS教程 JavaScript教程 Go语言教程 JQuery教程 VUE教程 VUE3教程 Bootstrap教程 SQL数据库教程 C语言教程 C++教程 Java教程 Python教程 Python3教程 C#教程
数码: 电脑笔记本显卡显示器固态硬盘硬盘耳机手机 iphone vivo oppo 小米华为单反装机图拉丁

-> 人工智能 -> 目标检测Faster-RCNN详解 -> 正文阅读

[人工智能]目标检测Faster-RCNN详解

概述

代码与环境说明

概述

??????? Faster-RCNN是两阶段目标检测算法的典型算法，它不再像古典的目标检测算法使用类似于selective search提取候选框，而是使用RPN(region proposal network)网络提取候选框，因为有RPN网络的加入，Faster-RCNN是可以端对端训练。本文将详解算法结构，正负例划分，损失函数等

代码与环境说明

代码：GitHub - talhuam/faster-rcnn-tf2: A framework for object detection

环境：tensorflow-gpu==2.4.0

显卡：RTX3080 16G

网络架构

backbone

????????backbone采用的是VGG16，最后一层maxpooling没有做，最终的特征图相较于最初的图片长和宽都缩小了16倍，如果输入图片是600 x 600 x 3，所得的特征图是37 x 37 x 512

代码实现：

def VGG16(inputs):
    # block1
    x = Conv2D(64, (3, 3), activation='relu', padding='same', name='block1_conv1')(inputs)
    x = Conv2D(64, (3, 3), activation='relu', padding='same', name='block1_conv2')(x)
    x = MaxPooling2D(pool_size=(2, 2), strides=(2, 2), name='block1_pool')(x)

    # block2
    x = Conv2D(128, (3, 3), activation='relu', padding='same', name='block2_conv1')(x)
    x = Conv2D(128, (3, 3), activation='relu', padding='same', name='block2_conv2')(x)
    x = MaxPooling2D(pool_size=(2, 2), strides=(2, 2), name='block2_pool')(x)

    # block3
    x = Conv2D(256, (3, 3), activation='relu', padding='same', name='block3_conv1')(x)
    x = Conv2D(256, (3, 3), activation='relu', padding='same', name='block3_conv2')(x)
    x = Conv2D(256, (3, 3), activation='relu', padding='same', name='block3_conv3')(x)
    x = MaxPooling2D(pool_size=(2, 2), strides=(2, 2), name='block3_pool')(x)

    # block4
    x = Conv2D(512, (3, 3), activation='relu', padding='same', name='block4_conv1')(x)
    x = Conv2D(512, (3, 3), activation='relu', padding='same', name='block4_conv2')(x)
    x = Conv2D(512, (3, 3), activation='relu', padding='same', name='block4_conv3')(x)
    x = MaxPooling2D(pool_size=(2, 2), strides=(2, 2), name='block4_pool')(x)

    # block5
    x = Conv2D(512, (3, 3), activation='relu', padding='same', name='block5_conv1')(x)
    x = Conv2D(512, (3, 3), activation='relu', padding='same', name='block5_conv2')(x)
    x = Conv2D(512, (3, 3), activation='relu', padding='same', name='block5_conv3')(x)
    # x = MaxPooling2D(pool_size=(2, 2), strides=(2, 2), name='block4_pool')

    return x

整体架构

?????????经过vgg16主干网络之后会在经过一次3*3的卷积，此后会分别经过1*1的卷积进行通道降维，两次1*1的卷积如下：

上部分卷积之后会生成(37,37,9)的tensor，9用来预测有物体的概率；
下部分会生成(37,37,9 x 4)的tensor，9*4用来预测框的坐标；

????????RPN网络之后会进行NMS(non_max_suppression)极大值抑制生成候选框(proposal)。再结合主干网络生成的特征层经过 7 x 7 的ROI pooling层，此过程就是用候选框在特征层上截取，再将截取的部分分成7 x 7的网格，网格内部进行最大池化。截取之前需要将预测出来的 $[t_x,t_y,t_w,t_h]$ 转化为 $[x_{min},y_{min},x_{max},y_{max}]$

??????? 最后再将ROI pooling的结果经过全连接层，预测出结果。最后多分类的全连接层用的激活函数是softmax，进行多分类，预测出属于各个类别的概率。回归结果需要将预测出来的 $[t_x,t_y,t_w,t_h]$ 转化为 $[x_{min},y_{min},x_{max},y_{max}]$ 。同类别的框需要再进行一次NMS(阈值0.3)，最终输出结果

正负例划分

RPN正负例

??????? 先验框(anchor)的大小有三种，为[128, 256, 512]；三种比例尺寸，为[1:1, 1:2, 2:1]，特征图的每个特征点对应三种大小三种比例共9个先验框。如上所述，如果输入图片的尺寸为600*600，则特征图的尺寸为37*37，那么共有37*37*9=12321个先验框。如下代码会基于特征图的尺寸计算所有先验框的 $[x_{min},y_{min},x_{max},y_{max}]$ 的值并进行归一化

def generate_anchors(sizes=[128, 256, 512], ratios=[[1, 1], [1, 2], [2, 1]]):
    """
    生成基础的9个不同尺寸不同比例的框
    :param sizes:
    :param ratios:
    :return:
    """
    num_anchors = len(sizes) * len(ratios)
    anchors = np.zeros((num_anchors, 4))
    anchors[:, 2:] = np.tile(sizes, [2, len(ratios)]).T

    for i in range(len(ratios)):
        anchors[3 * i:3 * i + 3, 2] = anchors[3 * i:3 * i + 3, 2] * ratios[i][0]
        anchors[3 * i:3 * i + 3, 3] = anchors[3 * i:3 * i + 3, 3] * ratios[i][1]

    anchors[:, 0::2] -= np.tile(anchors[:, 2] * 0.5, [2, 1]).T
    anchors[:, 1::2] -= np.tile(anchors[:, 3] * 0.5, [2, 1]).T
    return anchors


def shift(feature_shape, anchors, stride=16):
    """
    对基础的先验框扩展获得所有的先验框
    :param feature_shape:
    :param anchors:
    :param stride:
    :return:
    """
    shift_x = (np.arange(0, feature_shape[1], dtype=np.float_) + 0.5) * stride
    shift_y = (np.arange(0, feature_shape[0], dtype=np.float_) + 0.5) * stride
    # 框中心点的位置
    shift_x, shift_y = np.meshgrid(shift_x, shift_y)
    # 在一维的shift_x中的元素和shift_y中的元素一一对应形成坐标的x和y
    shift_x = np.reshape(shift_x, (-1))
    shift_y = np.reshape(shift_y, (-1))

    # 将shift_x和shift_y堆叠两次，分别来调整左上角和右下角的坐标
    shifts = np.stack([
        shift_x, shift_y, shift_x, shift_y
    ], axis=0)
    shifts = np.transpose(shifts)
    num_anchors = anchors.shape[0]  # 9

    k = shifts.shape[0]  # 37*37 = 1369
    # 对应维度广播，生成先验框在原图上的左上角和右下角的坐标，shape:[1369(7*7),9,4]
    shifted_anchors = np.reshape(anchors, (1, num_anchors, 4)) + tf.reshape(shifts, (k, 1, 4))
    # (1369 * 9,4)
    shifted_anchors = np.reshape(shifted_anchors, [k * num_anchors, 4])

    return shifted_anchors


def get_anchors(input_shape, sizes=[128, 256, 512], ratios=[[1, 1], [1, 2], [2, 1]], stride=16):
    # ------------------------ #
    # vgg16作为主干特征提取网络，最后一次池化没有做，故而是原有的尺寸的1/16
    # 输入如果是600 * 600,则feature_shape就是37 * 37
    # ------------------------ #
    feature_shape = (int(input_shape[0] / 16), int(input_shape[1] / 16))
    anchors = generate_anchors(sizes, ratios)
    anchors = shift(feature_shape, anchors, stride=stride)
    anchors = anchors.copy()
    # 进行归一化
    anchors[:, 0::2] /= input_shape[1]
    anchors[:, 1::2] /= input_shape[0]
    # 裁剪小于0和大于1的值
    anchors = np.clip(anchors, 0, 1)

    return anchors

??????? 所有的先验框和每个真实框(Groud Truth box)计算IoU，IoU大于0.7的作为正例(标签为1)；介于0.3到0.7之间的先验框忽略掉(标签为-1)，不参与计算loss；小于0.3的先验框为负例(标签为0)。其中正例的 $[x_{min},y_{min},x_{max},y_{max}]$ 需要转化为 $[t_x^*,t_y^*,t_w^*,t_h^*]$ ，转化公式如下：

$\begin{aligned} &t_x^*=(x^*-x_a)/w_a & t_y^*=(y^*-y_a)/h_a \\ &t_w^*=log(w^*/w_a) & t_h^*=log(h^*/h_a) \end{aligned}$

代码实现：

# ----------------------------------------------------------------- #
# 将 真实框 和 重合度高的先验框 转化为FRCNN预测结果的格式：[center_x, center_y, width, height]
# ----------------------------------------------------------------- #
# 真实框转化
box_center = 0.5 * (box[2:] + box[:2])
box_wh = box[2:] - box[:2]
# 先验框转化
assigned_anchors_center = 0.5 * (assigned_anchors[:, 2:4] + assigned_anchors[:, :2])
assigned_anchors_wh = assigned_anchors[:, 2:4] - assigned_anchors[:, :2]
# ----------------------------------------------------------------- #
# 计算t_star
# ----------------------------------------------------------------- #
encoded_box[:, :2][assign_mask] = (box_center - assigned_anchors_center) / assigned_anchors_wh
encoded_box[:, :2][assign_mask] /= np.array(variance)[:2]

encoded_box[:, 2:4][assign_mask] = np.log(box_wh/assigned_anchors_wh)
encoded_box[:, 2:4][assign_mask] /= np.array(variance)[2:]

正负例样本数总和256，正负例各128，正例不足用负例填充，超过总数256的标签值置为-1忽略掉，代码实现如下：

# --------------------------------------- #
# 对正样本和负样本进行筛选，训练样本之和为256
# --------------------------------------- #
pos_idx = np.where(classification > 0)[0]
num_pos = len(pos_idx)
if num_pos > self.num_sample // 2:
	num_pos = self.num_sample // 2
	disable_index = np.random.choice(pos_idx, size=(len(pos_idx) - self.num_sample // 2), replace=False)
	classification[disable_index] = -1
	regression[disable_index, -1] = -1

neg_idx = np.where(classification == 0)[0]
num_neg = self.num_sample - num_pos
if len(neg_idx) > num_neg:
	disable_index = np.random.choice(neg_idx, size=(len(neg_idx) - num_neg), replace=False)
	classification[disable_index] = -1
	regression[disable_index, -1] = -1

模型预测正负例

??????? RPN网络正向传播之后，会进行NMS非极大值抑制过滤出k个候选框(proposal)，NMS阈值为0.7。k个候选框将和每个真实框计算IoU，候选框的标签值(cat,dog,...)为对应IoU最大的真实框的标签值，和真实框最大IoU大于等于0.5的作为正例，介于0到0.5之间的作为负例，正例和负例数相加为128，正例不足的用负例填充

# 获得每个建议框roi最对应的真实框的iou
max_iou = np.max(iou, axis=1)  # [len(R) + len(bboxes),] 又 [num_roi,]
gt_assignment = np.argmax(iou, axis=1)  # [num_roi,]

# 和哪个GT的IoU最大，标签就是哪个GT的标签，[num_roi,]
gt_roi_label = label[gt_assignment]

# ------------------------------------------------------------ #
# 和GT的IoU大于pos_iou_thresh是正例
# 将正例控制在n_sample/2之下,如果超过了则随机截取，如果不够则用负例填充
# ------------------------------------------------------------ #
pos_indices = np.where(max_iou >= self.pos_iou_thresh)[0]
pos_roi_per_this_image = int(min(self.n_sample // 2, pos_indices.size))
if pos_indices.size > pos_roi_per_this_image:
	# replace：True可以重复先择,False不可以重复选择，元素不够则报错
	pos_indices = np.random.choice(pos_indices, size=pos_roi_per_this_image, replace=False)

# ------------------------------------------------------------ #
# 和GT的IoU大于neg_iou_thresh_low,小于neg_iou_thresh_high的作为负例
# 正例数量和负例的数量相加等于n_sample
# ------------------------------------------------------------ #
neg_indices = np.where((max_iou >= self.neg_iou_thresh_low) & (max_iou < self.neg_iou_thresh_high))[0]
neg_roi_per_this_image = self.n_sample - pos_roi_per_this_image
if neg_roi_per_this_image > neg_indices.size:
	neg_indices = np.random.choice(neg_indices, size=neg_roi_per_this_image, replace=True)
else:
	neg_indices = np.random.choice(neg_indices, size=neg_roi_per_this_image, replace=False)

# 正例和负例的索引
keep_indices = np.append(pos_indices, neg_indices)
# 保留下来的正负例框,[n_samples, 4]
sample_roi = R[keep_indices]

损失函数

RPN损失

??????? RPN损失包含两部分，一部分是坐标的回归损失，另一部分是二分类损失(有无object)

回归损失

??????? 回归损失使用的是smooth L1损失，公式如下，公式中的 $x=\hat{y}-y$ ：

$smooth\_l1\left\{\begin{matrix}0.5*x^2, if&|x|<1 & \\ |x|-0.5&otherwise & \end{matrix}\right.$

实现代码如下：

def rpn_smooth_l1(sigma=1.0):
    """
    rpn smooth l1 损失，只有正例和GT计算损失
    :param sigma:
    :return:
    """
    sigma_squared = sigma ** 2

    def _rpn_smooth_l1(y_true, y_pred):
        # ----------------------- #
        # y_true [batch_size, num_anchors, 4 + 1]
        # y_pred [batch_size, num_anchors, 4]
        # ----------------------- #
        regression = y_pred
        regression_target = y_true[:, :, :-1]
        # ----------------------- #
        # -1是要忽略的，0是背景，1是存在目标
        # ----------------------- #
        anchor_state = y_true[:, :, -1]
        # ----------------------- #
        # 获取正样本
        # ----------------------- #
        indices = tf.where(tf.keras.backend.equal(anchor_state, 1))
        regression = tf.gather_nd(regression, indices)  # 2-D array
        regression_target = tf.gather_nd(regression_target, indices)  # 2-D array
        # ----------------------- #
        # 计算smooth l1损失:
        # 0.5*x^2 if |x|<1
        # |x|-0.5 otherwise
        # ----------------------- #
        regression_diff = regression - regression_target
        x = tf.abs(regression_diff)
        loss_arr = tf.where(x < 1.0 / sigma_squared,
                            0.5 * sigma_squared * tf.pow(x, 2),
                            x - 0.5 / sigma_squared
                            )
        # 将loss全部加起来
        total_loss = tf.reduce_sum(loss_arr)
        # total_loss 除以样本数，计算平均loss
        num_indices = tf.maximum(1., tf.cast(tf.shape(indices)[0], tf.float32))
        avg_loss = total_loss / num_indices
        return avg_loss

    return _rpn_smooth_l1

分类损失

??????? 分类损失使用的是二分类的交叉熵损失，实现代码如下：

def rpn_cls_loss():
    """
    rpn只做二分类，是否有object
    二分类交叉熵损失
    :return:
    """

    def _rpn_cls_loss(y_true, y_pred):
        # ----------------------- #
        # y_true [batch_size,num_anchor,1]
        # y_pred [batch_size,num_anchor,1]
        # -1是要忽略的，0是背景，1是存在目标
        # ----------------------- #
        anchor_state = y_true
        # ----------------------- #
        # 获得无需忽略的所有样本
        # ----------------------- #
        indices_for_not_ignore = tf.where(tf.keras.backend.not_equal(anchor_state, -1))
        y_true_no_ignore = tf.gather_nd(y_true, indices_for_not_ignore)  # 1-D array
        y_pred_no_ignore = tf.gather_nd(y_pred, indices_for_not_ignore)  # 1-D array
        # ----------------------- #
        # 计算交叉熵
        # ----------------------- #
        y_true_no_ignore = tf.cast(y_true_no_ignore, dtype=tf.float32)
        y_pred_no_ignore = tf.cast(y_pred_no_ignore, dtype=tf.float32)
        cross_entropy_loss = tf.keras.losses.binary_crossentropy(y_true_no_ignore, y_pred_no_ignore)
        return cross_entropy_loss

    return _rpn_cls_loss

模型预测损失

??????? 模型最终的预测损失也有回归损失和分类损失，和RPN分类损失不一样的是，RPN分类是二分类，即是否有物体，最终模型分类损失是多分类，即属于哪一类(cat? dog? car?...)

回归损失

??????? 和RPN回归损失一样，这里用的也是smooth L1损失，代码如下：

def classifier_smooth_l1(num_classes, sigma=1.0):
    """
    最后框的回归损失函数
    :param num_classes:
    :param sigma:
    :return:
    """
    sigma_squared = sigma ** 2
    epsilon = 1e-4

    def _classifier_smooth_l1(y_true, y_pred):
        regression = y_pred
        regression_target = y_true[:, :, 4 * num_classes:]

        regression_diff = regression_target - regression
        x = tf.abs(regression_diff)
        loss_arr = tf.where(x < 1 / sigma_squared,
                            0.5 / sigma_squared * tf.pow(x, 2),
                            x - 0.5 / sigma_squared
                            )

        loss = tf.reduce_sum(loss_arr * y_true[:, :, :4 * num_classes]) * 4
        normalizer = tf.keras.backend.sum(epsilon + y_true[:, :, :4 * num_classes])
        loss = loss / normalizer
        return loss
    return _classifier_smooth_l1

分类损失

??????? 多分类交叉熵，实现代码如下：

def classifier_cls_loss():
    """
    最后分类的损失函数
    :return:
    """
    def _classifier_cls_loss(y_true, y_pred):
        return tf.keras.losses.categorical_crossentropy(y_true, y_pred)

    return _classifier_cls_loss

人工智能最新文章

2022吴恩达机器学习课程——第二课（神经网

第十五章规则学习

FixMatch: Simplifying Semi-Supervised Le

数据挖掘Java——Kmeans算法的实现

大脑皮层的分割方法

【翻译】GPT-3是如何工作的

论文笔记:TEACHTEXT: CrossModal Generaliz

python从零学（六）

详解Python 3.x 导入(import)

【答读者问27】backtrader不支持最新版本的

加:2022-09-21 00:29:07 更:2022-09-21 00:31:38

360图书馆购物三丰科技阅读网日历万年历 2025年7日历

-2025/7/27 20:07:56-

图片自动播放器
↓图片自动播放器↓

TxT小说阅读器
↓语音阅读,小说下载,古典文学↓

一键清除垃圾
↓轻轻一点,清除系统垃圾↓

图片批量下载器
↓批量下载图片,美女图库↓

网站联系: qq:121756557 email:121756557@qq.com IT数码