写在前面:

黄宁然，你们甲方的项目不好做啊。

问题来源:

之前已经基于WIN10+VS2015+opencv3.4.12，实现Yolov3。
（https://download.csdn.net/download/xiaohuolong1827/34664248）
用于项目时，才发现，该实现不支持cuda，需要opencv4以上版本才支持，这在dnn.hpp有提及：//OpenCV 4.x: DNN_BACKEND_CUDA,
无奈黄甲方所有工程一直使用opencv3.4.12版本。
为此，解决方案之一，便是基于libtorch实现yolov3。

参考文献：

[1] 罗彬，Yolov3模型在pytorch上训练，在C++中利用Libtorch上进行模型的加载、推理，
https://zhuanlan.zhihu.com/p/246156517?utm_source=qq
[2] L2_Zhang, c++调用pytorch libtorch(YoloV3实战篇),
https://blog.csdn.net/WANGWUSHAN/article/details/118968060

1. 下载yolov3 pytorch版本

下载网址： https://github.com/eriklindernoren/PyTorch-YOLOv3

2. 在python中制作pt文件

2.1 pt文件制作

打开工程中的detect.py文件，指定模型的各参数文件路径（事先已训练好）。例如：

args.model = '../lxxz_yolo_test/yolov3_1classes.cfg'
args.weights = '../lxxz_yolo_test/trained_weights_final_202204252156.weights'
args.classes = '../lxxz_yolo_test/lxxz_classes.txt'
args.n_cpu = 1
args.conf_thres = 0.5
args.nms_thres = 0.5
args.images = '../lxxz_image/'

这里，重写detect_directory函数：

def detect_directory2(model_path, weights_path, img_path, classes, output_path,
                     batch_size=8, img_size=416, n_cpu=8, conf_thres=0.5, nms_thres=0.5):
    files_list = os.listdir(img_path)
    model = load_model(model_path, weights_path)
    for f in files_list:
        img = cv2.imread(img_path+f,cv2.IMREAD_COLOR)
        boxes = detect_image(model,img,img_size=img_size,conf_thres=conf_thres,nms_thres=nms_thres)
        draw_img = draw_boxes(img, boxes)
        print(boxes)
    print(f"---- Detections were saved to: '{output_path}' ----")

主程序中，原来调用detect_directory改为调用detect_directory2。主要是想调用detect_image这个子函数。在detect_image函数中，在网络预测之后，根据需要添加生成pt文件或调用pt文件的语句

with torch.no_grad():
        detections = model(input_img)
        detections = non_max_suppression(detections, conf_thres, nms_thres)
        detections = rescale_boxes(detections[0], img_size, image.shape[:2])
    ##根据需要将网络保存为pt格式文件
    traced_model = torch.jit.trace(model, input_img, check_trace=False)
    traced_model.save("yolo_temp.pt")
    test_out = traced_model(input_img)
    ##根据需要，导入pt文件，进行预测
    model2 = torch.jit.load("yolo_temp.pt")
    output2 = model2.forward(input_img)
    output2 = non_max_suppression(output2, conf_thres, nms_thres)
    output2 = rescale_boxes(output2[0], img_size, image.shape[:2])
    print(output2[0].equal(detections[0]))

2.2 几点注意的地方

（1）若运行出错

若在torch.jit.trace时出错，需要设置参数 check_trace=False
在models.py文件中的_make_grid函数，若调用torch.meshgrid函数：
yv, xv = torch.meshgrid([torch.arange(ny), torch.arange(nx)], indexing=‘ij’)
出现indexing参数错误，则将indexing参数删除。

（2）gpu或cpu

在trace生成pt文件时，若model、input_img位于‘cuda’，则生成的pt为GPU版，否则生成的pt为cpu版。

（3）生成pt文件时不可随意在模型中添加非固定代码

在python中，使用torch.jit.trace生成pt文件：

traced_model = torch.jit.trace(model, input_img, check_trace=False)
traced_model.save("yolo_temp.pt")

这里，不可为了在c++预测中图省事，而将非极大值抑制代码加入到model中。原因是：在使用torch.jit.trace生成pt文件时，应保证model的输出是一固定的维度，例如，yolov3的输出是10647*（4+1+类别数）维度；倘若在model中加入非极大值抑制，网络的输出将变得不确定（例如，经非极大值抑制后，对于某张图，可能预测出2个框，对于另一张图可能预测4个框，这就是不确定）。直接导致的出错现象：使用图片A进行trace、save成pt文件；然后load该pt模型，使用图片A进行预测，结果正常，但若使用其它图片进行预测，失败。

3. 在libtorch中进行预测

VS2017事先建立好libtorch工程，保证工程可正常使用torch。可参考：
https://blog.csdn.net/xiaohuolong1827/article/details/121428648

3.1 导入模型

主程序中导入模型

torch::jit::script::Module module = torch::jit::load(“D:\\xxx_gpu.pt”);

3.2 读取图片

主程序中读取图片，并进行格式处理

	Mat imgSrc = imread("D: \\60_1.265_89.74.tif", -1);
	Mat img = imgSrc.clone();
	if (img.depth() == CV_16U)
	{
		img.convertTo(img, CV_32F, 1.0 / 65535);
	}
	else
	{
		img.convertTo(img, CV_32F, 1.0 / 255);
	}
	cv::resize(img, img, Size(416, 416));
	if (img.channels() == 1)
	{
		cv::cvtColor(img, img, cv::COLOR_GRAY2BGR);
	}

3.3 进行网络预测

编写预测函数

int torch_model_predict(void*pmodule, Mat img_input, Mat *img_output, int cuda_flag)
{
	Mat img = img_input.clone();
	if (pmodule == 0)
	{
		return 1;
	}
	try
	{
		torch::DeviceType device_type = (cuda_flag) ? at::kCUDA : at::kCPU;
		torch::jit::script::Module *module = (torch::jit::script::Module *)pmodule;
		module->to(device_type);
		module->eval();
		//制作tensor
		Mat data_src;
		img.convertTo(data_src, CV_32F, 1.0);//无论img什么类型，先转为float类型
		torch::Tensor tensor_image = torch::from_blob(data_src.data, { 1,img.rows, img.cols,img.channels() }, torch::kFloat);// torch::kFloat//torch::kByte
		tensor_image = tensor_image.permute({ 0,3,1,2 });//将第3维度提前
		tensor_image = tensor_image.to(device_type);
		//网络预测
		at::Tensor outputs = module->forward({ tensor_image }).toTensor();
		//提取预测结果//
		int size_arr[10] = { 0 };//最多接受10维
		int n = outputs.dim();
		if (n > 10)
		{
			return 2;
		}
		for (int i = 0; i < n; i++)
		{
			size_arr[i] = outputs.size(i);
		}
		outputs = outputs.to(at::kCPU);
		Mat outimg(n, size_arr, CV_32F, outputs.data_ptr());
		*img_output = outimg.clone();
	}
	catch (...)
	{
		return 3;
	}
	return 0;
}

在主程序中调用：

	Mat img_predict2;
	int reu = 0;
	reu = torch_model_predict((void*)module, img, &img_predict2, 1);

3.4 网络输出的后处理

在Python中，我们已知道网络的输出维度是：1*10647*n，n=（5+类别数）

	int size_arr[10];
	for (int i = 0; i < img_predict2.dims; i++)
	{
		cout << img_predict2.size[i]<< " ";
		size_arr[i] = img_predict2.size[i];
	}	
	Mat boxes (Size(size_arr[2], size_arr[1]), CV_32F, img_predict2.data);

这样便得到预测的框了。

3.5 阈值处理、非极大值抑制

对框进行阈值处理、nms抑制处理。这部分程序可使用基于opencv实现YOLOv3相应部分的程序。但要先对boxes的位置进行归一化理，每个box格式为（cx，cy，w，h，s，c0~cn）

	for (int i = 0; i < boxes.rows; i++)
	{
		boxes.at<float>(i, 0) /= img_size;
		boxes.at<float>(i, 1) /= img_size;
		boxes.at<float>(i, 2) /= img_size;
		boxes.at<float>(i, 3) /= img_size;
	}

img_size为416。
然后将boxes转为vector，便可调用网络上常用的opencv实现yolov3工程里的postprocess函数了。
（https://download.csdn.net/download/xiaohuolong1827/34664248）
在此，贴上代码：

void postprocess(Mat& frame, const vector<Mat>& outs)
{
	//输出类
	vector<int> classIds;
	//置信度
	vector<float> confidences;
	vector<Rect> boxes;

	//遍历所有的输出层
	for (size_t i = 0; i < outs.size(); ++i)
	{
		// Scan through all the bounding boxes output from the network and keep only the
		// ones with high confidence scores. Assign the box's class label as the class
		// with the highest score for the box.
		//扫描所有来自网络的边界框输出，只保留具有高置信度分数的边界框。将框的类标签指定为框得分最高的类。
		//读取框
		float* data = (float*)outs[i].data;
		for (int j = 0; j < outs[i].rows; ++j, data += outs[i].cols)
		{
			Mat scores = outs[i].row(j).colRange(5, outs[i].cols);
			Point classIdPoint;
			double confidence;
			// Get the value and location of the maximum score 获取置信度和位置参数
			minMaxLoc(scores, 0, &confidence, 0, &classIdPoint);
			//如果大于置信度阈值
			if (confidence > confThreshold)
			{
				//获取坐标
				int centerX = (int)(data[0] * frame.cols);
				int centerY = (int)(data[1] * frame.rows);
				int width = (int)(data[2] * frame.cols);
				int height = (int)(data[3] * frame.rows);
				int left = centerX - width / 2;
				int top = centerY - height / 2;

				classIds.push_back(classIdPoint.x);
				confidences.push_back((float)confidence);
				boxes.push_back(Rect(left, top, width, height));
			}
		}
	}

	// Perform non maximum suppression to eliminate redundant overlapping boxes with
	// lower confidences
	//输出非极大性抑制结果，按置信度从大到小输出
	vector<int> indices;
	//非极大性抑制
	NMSBoxes(boxes, confidences, confThreshold, nmsThreshold, indices);
	//绘图
	for (size_t i = 0; i < indices.size(); ++i)
	{
		int idx = indices[i];
		Rect box = boxes[idx];
		//类，置信度
		drawPred(classIds[idx], confidences[idx], box.x, box.y,
			box.x + box.width, box.y + box.height, frame);
	}
}

我在用时，做了少许修改：一是confidence的取值；二是，保存box信息。如下（大家仔细考虑后，慎用）：

typedef struct
{
	int c;
	float s;
	int pt1x, pt1y, pt2x, pt2y;
}STRUCT_YOLO_PREDICT_BOX;
void postprocess(Mat& frame, const vector<Mat> outs, vector<STRUCT_YOLO_PREDICT_BOX>*reu)
{
	//输出类
	vector<int> classIds;
	//置信度
	vector<float> confidences;
	vector<Rect> boxes;

	//遍历所有的输出层
	for (size_t i = 0; i < outs.size(); ++i)
	{
		// Scan through all the bounding boxes output from the network and keep only the
		// ones with high confidence scores. Assign the box's class label as the class
		// with the highest score for the box.
		//扫描所有来自网络的边界框输出，只保留具有高置信度分数的边界框。将框的类标签指定为框得分最高的类。
		//读取框
		Mat out_temp = outs[i].clone();
		float* data = (float*)outs[i].data;
		for (int j = 0; j < outs[i].rows; ++j, data += outs[i].cols)
		{
			Mat scores = outs[i].row(j).colRange(5, outs[i].cols);
			Point classIdPoint;
			double confidence;
			// Get the value and location of the maximum score 获取置信度和位置参数
			minMaxLoc(scores, 0, &confidence, 0, &classIdPoint);
			confidence = outs[i].at<float>(j, 4);
			//如果大于置信度阈值
			if (confidence > confThreshold)
			{
				//获取坐标
				int centerX = (int)(data[0] * frame.cols);
				int centerY = (int)(data[1] * frame.rows);
				int width = (int)(data[2] * frame.cols);
				int height = (int)(data[3] * frame.rows);
				int left = centerX - width / 2;
				int top = centerY - height / 2;			

				classIds.push_back(classIdPoint.x);
				confidences.push_back((float)confidence);
				boxes.push_back(Rect(left, top, width, height));
			}
		}
	}

	// Perform non maximum suppression to eliminate redundant overlapping boxes with
	// lower confidences
	//输出非极大性抑制结果，按置信度从大到小输出
	vector<int> indices;
	//非极大性抑制
	NMSBoxes(boxes, confidences, confThreshold, nmsThreshold, indices);
	//绘图+保存结果
	reu->clear();
	for (size_t i = 0; i < indices.size(); ++i)
	{
		int idx = indices[i];
		Rect box = boxes[idx];
		//类，置信度
		drawPred(classIds[idx], confidences[idx], box.x, box.y,
			box.x + box.width, box.y + box.height, frame);
		// 保存结果
		STRUCT_YOLO_PREDICT_BOX r;
		r.c = classIds[idx];
		r.s = confidences[idx];
		r.pt1x = box.x;
		r.pt1y = box.y;
		r.pt2x = box.x + box.width;
		r.pt2y = box.y + box.height;
		reu->push_back(r);
	}
}

3.6 几点注意的地方

（1）维度顺序

在制作网络的输入tensor时，使用torch::from_blob要注意维度顺序：
{ 1,img.rows, img.cols,img.channels() }，即batch、行、列、通道数。

（2）GPU或CPU需要严格对应

Yolov3在libtroch上进行预测时，是否使用GPU需要与保存为pt文件时保持一致，即：在python上保存pt文件时，若模型是在CPU中，则在libtorch进行预测时，也需要将模型放在cpu中；如果在保存为pt文件时，模型是放在GPU中，则在libtorch中进行预测时，需要将模型放在GPU中。
将模型放在GPU或CPU的方法是：model.to(‘cpu’)、model.to(‘cuda’)，图像tensor也要做同样的处理。
这一点，与unet不同，unet在libtorch上进行预测时，只需要保证model和图像tensor位于同样的device中（CPU或cuda）即可，无需关注在生成pt文件时，模型处于GPU或CPU中。原因未探究。

4. 其它

该C++工程基于VS2017实现，可形成dll工程，形成dll文件，便可在vs2013、vs2015中调用了（有些项目方，例如，黄甲方，使用的仍然是VS2013的版本，该版本不支持直接使用libtorch，所以可先通过vs2017，将libtorch相关调用函数封装成dll，然后在VS2013中调用）。
真是又水了一篇水文。