[人工智能] Yolov5 TensorRT推理加速(c++版)

开发: C++知识库 Java知识库 JavaScript Python PHP知识库人工智能区块链大数据移动开发嵌入式开发工具数据结构与算法开发测试游戏开发网络协议系统运维
教程: HTML教程 CSS教程 JavaScript教程 Go语言教程 JQuery教程 VUE教程 VUE3教程 Bootstrap教程 SQL数据库教程 C语言教程 C++教程 Java教程 Python教程 Python3教程 C#教程
数码: 电脑笔记本显卡显示器固态硬盘硬盘耳机手机 iphone vivo oppo 小米华为单反装机图拉丁

-> 人工智能 -> Yolov5 TensorRT推理加速(c++版) -> 正文阅读

[人工智能]Yolov5 TensorRT推理加速(c++版)

Yolov5 不做赘述，目前目标检测里使用非常多的模型，效果和速度兼顾，性能强悍，配合TensorRT推理加速，在工业界可以说是非常流行的组合。

废话不多说，直接开整，以下使用的Tensor RT部署推理路线为：Pytorch-> ONNX -> TensorRT。

pytorch导出到onnx模型，可以非常方便，并且支持dynamic维度，配合netron工具，可以查看模型的网络结构，而TensorRT对ONNX的支持也非常完整，所以选择这一套流程，可以非常轻松的完成TensorRT的部署，同时，tensor RT提供官方的nms插件，使得推理代码可以免去编写nms的部分，极大提高效率。

GitHub仓库地址： gentlemanarch/yolov5-tensorrt

环境准备

系统：Ubuntu20.04LTS系统，或者tensorRT官方docker镜像：nvcr.io/nvidia/tensorrt:21.05-py3（推荐）
TensorRT: 本篇使用的TensorRT7.2.3
Yolov5: 截止2021.09.11的develop分支代码
gcc: 9.3.0
torch: 1.8.2
onnx:1.10.1
onnx-simplifier:0.3.6

pytorch模型训练

clone Yolov5的官方代码，按照教程训练得到pt权重文件。

github地址：

https://github.com/ultralytics/yolov5

Torch -> onnx

在导出到onnx之前，为了方便后续添加nms插件，需要对torch的模型输出做一些修改.

models/yolo.py:

将这部分代码

 ? ? ? ? ? ? ? ?

                if self.inplace:
                    y[..., 0:2] = (y[..., 0:2] * 2. - 0.5 + self.grid[i]) * self.stride[i]  # xy
                    y[..., 2:4] = (y[..., 2:4] * 2) ** 2 * self.anchor_grid[i]  # wh
                else:  # for YOLOv5 on AWS Inferentia https://github.com/ultralytics/yolov5/pull/2953
                    xy = (y[..., 0:2] * 2. - 0.5 + self.grid[i]) * self.stride[i]  # xy
                    wh = (y[..., 2:4] * 2) ** 2 * self.anchor_grid[i].view(1, self.na, 1, 1, 2)  # wh
                    y = torch.cat((xy, wh, y[..., 4:]), -1)
                z.append(y.view(bs, -1, self.no))

替换为：

                if self.inplace:
                    xy = (y[..., 0:2] * 2. - 0.5 + self.grid[i]) * self.stride[i]  # xy
                    wh = (y[..., 2:4] * 2) ** 2 * self.anchor_grid[i].view(1, self.na, 1, 1, 2).expand(bs, self.na, 1, 1, 2)  # wh
                    rest = y[..., 4:]
                    yy = torch.cat((xy, wh, rest), -1)
                    z.append(yy.view(bs, -1, self.no))
                else:
                    xy = (y[..., 0:2] * 2. - 0.5 + self.grid[i]) * self.stride[i]  # xy
                    wh = (y[..., 2:4] * 2) ** 2 * self.anchor_grid[i].view(1, self.na, 1, 1, 2)  # wh
                    y = torch.cat((xy, wh, y[..., 4:]), -1)

同时，这句:

return x if self.training else (torch.cat(z, 1), x)

替换为：

return x if self.training else torch.cat(z, 1)

最后如图所示：

models/export.py

改写后，官方的export.py已不适用，使用以下export代码：

'''
export yolov5 .pt model to onnx model

Usage:
    python models/export.py --weights yolov5s.pt --img-size 640 \
    --batch-size 1 --device 0 --include onnx --inplace --dynamic \
    --simplify --opset-version 11 --img test_img/1.jpg
'''

import argparse
import sys
import time
from pathlib import Path

sys.path.append(Path(__file__).parent.parent.absolute().__str__())  # to run '$ python *.py' files in subdirectories

import torch
import torch.nn as nn
from torch.utils.mobile_optimizer import optimize_for_mobile

import models
from models.experimental import attempt_load
from utils.activations import Hardswish, SiLU
from utils.general import colorstr, check_img_size, check_requirements, file_size, set_logging
from utils.torch_utils import select_device


def xywh2xyxy(x):
    center = x[:, :, :2]
    wh = x[:, :, 2:] / 2.
    return torch.cat([center - wh, center + wh], -1)


class Yolov5(nn.Module):
    def __init__(self, opt):
        super().__init__()
        self.model = self.init_model(opt)
    
    def init_model(self, opt):
        # load PyTorch model
        model = attempt_load(opt.weights)

        for k, m in model.named_modules():
            m._non_persistent_buffers_set = set()  # pytorch 1.6 compatbility
            if isinstance(m, models.common.Conv):
                if isinstance(m.act, nn.Hardswish):
                    m.act = Hardswish()
                elif isinstance(m.act, nn.SiLU):
                    m.act = SiLU()
            elif isinstance(m, models.yolo.Detect):
                m.inplace = opt.inplace
                m.onnx_dynamic = opt.dynamic
        return model

    def forward(self, x):
        output = self.model(x)
        output = self.post_processing(output)
        return output

    def post_processing(self, x):
        bs, nb_box, infos = x.shape

        boxes_input = xywh2xyxy(x[..., :4]).reshape(bs, nb_box, 1, 4)
        scores_input = x[..., 5:] * x[..., 4:5]
        return [boxes_input, scores_input]

def remove_initializer_from_input(model):
    if model.ir_version < 4:
        print(
            'Model with ir_version below 4'
        )
        return
    inputs = model.graph.input
    name_to_input = {}
    for input in inputs:
        name_to_input[input.name] = input
    
    for initializer in model.graph.initializer:
        if initializer.name in name_to_input:
            inputs.remove(name_to_input[initializer.name])
    
    return model

def to_numpy(tensor):
    return tensor.detach().cpu().numpy() if tensor.requires_grad else tensor.cpu().numpy()

def preprocess_image(image_raw, INPUT_W=640, INPUT_H=640):
    h, w, c = image_raw.shape
    image = image_raw.copy()
    r_w = INPUT_W / w
    r_h = INPUT_H / h
    if r_h > r_w:
        tw = INPUT_W
        th = int(r_w * h)
        tx1 = tx2 = 0
        ty1 = int((INPUT_H - th) / 2)
        ty2 = INPUT_H - th - ty1
    else:
        tw = int(r_h * w)
        th = INPUT_H
        tx1 = int((INPUT_W - tw) / 2)
        tx2 = INPUT_W - tw - tx1
        ty1 = ty2 = 0
    image = cv2.resize(image, (tw, th))
    image = cv2.copyMakeBorder(
        image, ty1, ty2, tx1, tx2, cv2.BORDER_CONSTANT, (128, 128, 128)
    )
    return image


if __name__ == '__main__':
    parser = argparse.ArgumentParser()
    parser.add_argument('--weights', type=str, default='./yolov5s.pt', help='weights path')
    parser.add_argument('--img-size', nargs='+', type=int, default=[640, 640], help='image size')
    parser.add_argument('--batch-size', type=int, default=1, help='batch size')
    parser.add_argument('--device', default='cpu', help='cuda device, i.e. 0 or 0,1,2,3 or cpu')
    parser.add_argument('--include', nargs='+', default=['torchscript', 'onnx', 'coreml'], help='include format')
    parser.add_argument('--half', action='store_true', help='FP16 half-precision export')
    parser.add_argument('--inplace', action='store_true', help='set Yolov5 Detect() inplace=True')
    parser.add_argument('--train', action='store_true', help='model.train() mode')
    parser.add_argument('--optimize', action='store_true', help='optimize TorchScript for mobile') # TorchScript-only
    parser.add_argument('--dynamic', action='store_true', help='dynamic ONNX axes')
    parser.add_argument('--simplify', action='store_true', help='simplify ONNX model')
    parser.add_argument('--opset-version', type=int, default=11, help='ONNX opset version')
    parser.add_argument('--img', type=str, default='', help='test image path')
    opt = parser.parse_args()
    opt.img_size *= 2 if len(opt.img_size) == 1 else 1
    opt.img_size = (640, 640)
    opt.include = [x.lower() for x in opt.include]
    print(opt)
    set_logging()
    t = time.time()


    device = select_device(opt.device)


    import cv2
    image_path = opt.img
    image = cv2.imread(image_path)

    from utils.datasets import letterbox
    frame = preprocess_image(image, opt.img_size[0], opt.img_size[1])
    print('frame.shape: ', frame.shape)

    img = torch.from_numpy(frame).float().unsqueeze(0)
    img = img.permute(0, 3, 1, 2)
    img = img[:, [2, 1, 0]] / 255.

    assert not (opt.device.lower() == 'cpu' and opt.half), '--half only compatible with GPU export, i.e. use --device 0'
    model = Yolov5(opt)

    if opt.half:
        img, model = img.half(), model.half()
    model.train() if opt.train else model.eval()

    for k, m in model.named_modules():
        m._non_persistent_buffers_set = set()  # pytorch 1.6 compatbility
        if isinstance(m, models.common.Conv):
            if isinstance(m.act, nn.Hardswish):
                m.act = Hardswish()
            elif isinstance(m.act, nn.SiLU):
                m.act = SiLU()
        elif isinstance(m, models.yolo.Detect):
            m.inplace = opt.inplace
            m.onnx_dynamic = opt.dynamic

    for _ in range(2):
        y = model(img)
    print(f"\n{colorstr('PyTorch:')} starting from {opt.weights} ({file_size(opt.weights):.1f} MB)")

    if 'onnx' in opt.include:
        prefix = colorstr('ONNX:')
        try:
            import onnx

            print(f'{prefix} starting export with onnx {onnx.__version__}...')
            f = opt.weights.replace('.pt', 'fix.onnx') if not opt.dynamic else opt.weights.replace('.pt', '_dynamic.onnx')
            dynamic_axes = {'input': {0: 'batch'},
                            'boxes': {0: 'batch'},
                            'confs': {0: 'batch'}}

            torch.onnx.export(model, img, f, verbose=True, opset_version=opt.opset_version,
                              training=torch.onnx.TrainingMode.TRAINING if opt.train else torch.onnx.TrainingMode.EVAL,
                              do_constant_folding=True, export_params=True, operator_export_type=torch.onnx.OperatorExportTypes.ONNX,
                              input_names=['input'],
                              output_names=['boxes', 'confs'],
                              dynamic_axes=dynamic_axes if opt.dynamic else None)
            model_onnx = onnx.load(f)
            onnx.checker.check_model(model_onnx)

            import onnxoptimizer
            print("Beginning ONNX model path optimization")
            all_passes = onnxoptimizer.get_available_passes()
            passes = ["extract_constant_to initializer", "eliminate_unused_initializer", "fuse_bn_into_conv"]
            assert all(p in all_passes for p in passes)
            model_onnx = onnoptimizer.optimize(model_onnx, passes)
            print("Completed ONNX model path optimization")

            if opt.simplify:
                try:
                    check_requirements(['onnx-simplifier'])
                    import onnxsim

                    print(f'{prefix} simplifying with onnx-simplifier {onnxsim.__version__}...')
                    model_onnx, check = onnxsim.simplify(
                        model_onnx,
                        dynamic_input_shape=opt.dynamic,
                        input_shapes={'input': list(img.shape)} if opt.dynamic else None
                    )
                    assert check, 'assert check failed'
                    model_onnx = remove_initializer_from_input(model_onnx)
                    onnx.save(model_onnx, f)
                except Exception as e:
                    print(f'{prefix} simplifier failure: {e}')
                print(f'{prefix} export success, saved as {f} ({file_size(f):.1f} MB)')

        except Exception as e:
            print(f'{prefix} export failure: {e}')
            print(e)

    print(f'\nExport complete ({time.time() - t:.2f}s). Visualize with https://github.com/lutzroeder/netron.')

    import onnxruntime
    import numpy as np
    ort_session = onnxruntime.InferenceSession(f)
    ort_inputs = {ort_session.get_inputs()[0].name: to_numpy(img)}
    onnx_out = ort_session.run(None, ort_inputs)
    torch_out = model(img)

    import ipdb
    ipdb.set_trace()
    np.testing.assert_allclose(to_numpy(torch_out[0]), onnx_out[0], rtol=1e-03, atol=1e-05)
    print("Exported model has been tested with ONNXRuntime, and the result looks good!")
    np.testing.assert_allclose(to_numpy(torch_out[1]), onnx_out[1], rtol=1e-03, atol=1e-05)
    print("Exported model has been tested with ONNXRuntime, and the result looks good!")

在yolov5项目根目录中，使用以下命令导出onnx模型：

python models/export.py --weights yolov5s.pt --img-size 640 \
    --batch-size 1 --device 0 --include onnx --inplace --dynamic \
    --simplify --opset-version 11 --img test_img/1.jpg

其中参数：

weights指定pytorch权重路径
img-size指定图片输入尺寸，以上程序中，固定了输入维度为640，640，所以这个参数并不起作用，可以修改代码中的opt.img_size = (640, 640)部分
--dynamic，允许输入维度可变，以上提供的代码中，只有batchsize维度可变，如果需要height和width都可变，可将dynamic_axes修改如下：
```
dynamic_axes = {'input': {0: 'batch', 2: 'height', 3: 'width'},
                'boxes': {0: 'batch'},
                'confs': {0: 'batch'}}
```
--simplify简化onnx模型，去掉梯度、优化器等推理中不需要的部分
opset-version, 算法版本，目前11支持比较完善
img, 指定一张测试使用的图片

执行完导出命令后，会在pt权重文件对应的目录下，得到一个onnx模型。

onnx->TensorRT & TensorRT inference

编译C++代码

clone yolov5_trt代码到本地，

cd yolov5_trt
mkdir build
cd build
cmake ..
make -j 10

完成编译代码

自定义yaml配置文件

进入到data目录中，新建一个自己的数据目录，copy person目录中的yaml配置文件到自己目录中，修改其中内容：

# 以下所有与路径相关的配置文件的根目录， 左斜杠结尾
path: "../data/person/"
# 模型文件路径，trt若不存在，会使用onnx生成对应trt
model:
  onnx: "person.onnx"
  tensorrt: "person.trt"
# 输入测试图片，与结果保存路径
image:
  input: "person.jpg"
  output: "person_result.jpg"
# 输出类别，names文件路径
args:
  n_classes: 1
  names: "person.names"
  # yolov5输入图片尺寸
  channels: 3
  height: 640
  width: 640
  # 如果需要dynamic支持，需要配置下面：三个尺寸，最小，opt，最大，开启ifdynamic为1，否则为0
  ifdynamic: 1
  min: [1, 3, 640, 640]
  opt: [1, 3, 640, 640]
  max: [1, 3, 640, 640]
# 运行参数， demo运行测试，若trt不存在，会使用onnx默认生成fp32模型
# fp16表示使用fp16生成模型，int8同理， 取值{0, 1}
build:
  demo: 1
  fp16: 1
  int8: 0
  # 工作空间大小，单位M，暂时不支持修改
  workspace: 4
# nms 插件配置
nms:
  topK: 512
  keepTopK: 100
?
  # 以下nms相关参数，暂时不支持修改，使用下面列出的默认配置
  clipBoxes: 0
  iouThreshold: 0.25
  scoreThreshold: 0.45
  isNormalized: false
  output: ["num_detections", "nmsed_boxes", "nmsed_scores", "nmsed_classes"]

yaml中注释足够详细，简单说下用法，以需要检测person为例：

在data中，新建person目录（随便命名），复制yaml到person目录，其中，yaml中的 path: "../data/person/" 路径中person为自己命名的路径名字；
为检测标签新建一个names文件，命名为person.names, 并修改yaml中args.names项为对应名字
onnx文件同上
其他设置参见yaml中的注释

运行代码

进入到build文件夹中，执行：

./yolov5_trt -c ../data/person/person.yaml

最后参数为指定yaml配置文件的路径，即可

检测结果如下：

人工智能最新文章

2022吴恩达机器学习课程——第二课（神经网

第十五章规则学习

FixMatch: Simplifying Semi-Supervised Le

数据挖掘Java——Kmeans算法的实现

大脑皮层的分割方法

【翻译】GPT-3是如何工作的

论文笔记:TEACHTEXT: CrossModal Generaliz

python从零学（六）

详解Python 3.x 导入(import)

【答读者问27】backtrader不支持最新版本的

加:2021-09-12 13:09:55 更:2021-09-12 13:11:41

360图书馆购物三丰科技阅读网日历万年历 2025年11日历

-2025/11/19 3:57:38-

图片自动播放器
↓图片自动播放器↓

TxT小说阅读器
↓语音阅读,小说下载,古典文学↓

一键清除垃圾
↓轻轻一点,清除系统垃圾↓

图片批量下载器
↓批量下载图片,美女图库↓

网站联系: qq:121756557 email:121756557@qq.com IT数码