[人工智能] 基于PPYOLOE+的水下生物目标检测

开发: C++知识库 Java知识库 JavaScript Python PHP知识库人工智能区块链大数据移动开发嵌入式开发工具数据结构与算法开发测试游戏开发网络协议系统运维
教程: HTML教程 CSS教程 JavaScript教程 Go语言教程 JQuery教程 VUE教程 VUE3教程 Bootstrap教程 SQL数据库教程 C语言教程 C++教程 Java教程 Python教程 Python3教程 C#教程
数码: 电脑笔记本显卡显示器固态硬盘硬盘耳机手机 iphone vivo oppo 小米华为单反装机图拉丁

-> 人工智能 -> 基于PPYOLOE+的水下生物目标检测 -> 正文阅读

[人工智能]基于PPYOLOE+的水下生物目标检测

基于PP-YOLOE+的水下生物目标检测+部署

项目链接【https://aistudio.baidu.com/aistudio/projectdetail/4647849?contributionType=1】

1 项目背景

水下目标检测旨在对水下场景中的物体进行定位和识别。这项研究由于在海洋学、水下导航等领域的广泛应用而引起了持续的关注。但是，由于复杂的水下环境和光照条件，这仍然是一项艰巨的任务。

基于深度学习的物体检测系统已在各种应用中表现出较好的性能，但在处理水下目标检测方面仍然感到不足，主要有原因是：可用的水下目标检测数据集稀少，实际应用中的水下场景的图像杂乱无章，并且水下环境中的目标物体通常很小，而当前基于深度学习的目标检测器通常无法有效地检测小物体，或者对小目标物体的检测性能较差。同时，在水下场景中，与波长有关的吸收和散射问题大大降低了水下图像的质量，从而导致了可见度损失，弱对比度和颜色变化等问题。

Al+水下勘探是一个新兴领域，目前专门用于水下研究工作的解决方案不多，高质量的数据集更是弥足珍贵。使用 PP-YOLOE+ 来推进水下目标检测的进步，从而使得水下机器人等设备能够更加智能化，提高海底资源勘探等方面的效率。而水下机器人又反哺出高质量的水下目标检测数据集，推动 Al+水下勘探的发展。

2 方案选择

2.1 问题与挑战

深度学习中，数据往往决定了性能的上限，算法只是不断地逼近上限。尽管基于深度学习的方法在标准的目标检测中取得了可喜的性能。水下目标检测仍具有以下几点挑战：

（1）水下场景的实际应用中目标通常很小，含有大量的小目标；

（2）水下数据集和实际应用中的图像通常是模糊的，图像中具有异构的噪声。

2.2 方案选择

在这里插入图片描述

因此，针对以上所述的背景和水中目标检测所遇到的挑战，本项目将选用 PP-YOLOE+ 这一基于飞桨云边一体高精度模型PP-YOLOE迭代优化升级的版本。针对性的解决以上问题。

2.3 模型特点介绍

PP-YOLOE+ 具有如下特点：

超强性能
训练收敛加速
下游任务泛化性显著提升
高性能部署能力

3 环境配置

3.1 环境准备

PaddlePaddle >= 2.3.2
Python == 3.7

3.2 环境安装

%cd /home/aistudio/work/

# gitee 国内下载比较快
!git clone https://gitee.com/paddlepaddle/PaddleDetection.git -b develop

# github
# !git clone https://github.com/PaddlePaddle/PaddleDetection.git -b develop

# 环境安装
%cd /home/aistudio/work/

# gitee 国内下载比较快
# !git clone https://gitee.com/paddlepaddle/PaddleDetection.git -b develop

%cd PaddleDetection/
!pip install -r requirements.txt > /dev/null
/home/aistudio/work
/home/aistudio/work/PaddleDetection

4 数据集预处理

4.1 数据集介绍

本项目选用数据集来源于水下目标检测算法赛（注：数据由鹏城实验室提供），训练集是5543张 jpg 格式的水下光学图像与对应标注结果构成，其中主要有海参、海胆、扇贝、海星四种目标。
在这里插入图片描述

4.1.1 标签类别

supercategory	id	name
component	1	echinus
component	2	holothurian
component	3	scallop
component	4	starfish
component	5	waterweeds

4.1.2 图像分辨率

长度	宽度	图片数量
704	576	38
1920	1080	596
3840	2160	1712
720	405	3153
586	Text	44

4.2 数据集处理

4.2.1 数据集解压

# 1.数据集解压
!unzip data/data172711/fish.zip > /dev/null
!mv ./fish ./work/PaddleDetection/dataset

4.2.2 voc2coco

%cd work/PaddleDetection/
/home/aistudio/work/PaddleDetection

# 2.voc2coco
import argparse
import glob
import json
import os
import os.path as osp
import sys
import shutil

import numpy as np
import PIL.ImageDraw
import xml.dom.minidom as xmldom
import cv2
label_to_num = {}
categories_list = []
labels_list = []


class MyEncoder(json.JSONEncoder):
    def default(self, obj):
        if isinstance(obj, np.integer):
            return int(obj)
        elif isinstance(obj, np.floating):
            return float(obj)
        elif isinstance(obj, np.ndarray):
            return obj.tolist()
        else:
            return super(MyEncoder, self).default(obj)


def getbbox(self, points):
    polygons = points
    mask = self.polygons_to_mask([self.height, self.width], polygons)
    return self.mask2box(mask)


def images_labelme(data, num):
    image = {}
    image['height'] = data['imageHeight']
    image['width'] = data['imageWidth']
    image['id'] = num + 1
    image['file_name'] = data['imagePath'].split('/')[-1]
    return image


def images_cityscape(num, img_file, w, h):
    image = {}
    image['height'] = h
    image['width'] = w
    image['id'] = num + 1
    image['file_name'] = img_file
    return image


def categories(label, labels_list):
    category = {}
    category['supercategory'] = 'component'
    category['id'] = len(labels_list) + 1
    category['name'] = label
    return category


def annotations_rectangle(points, label, image_num, object_num, label_to_num):
    annotation = {}
    seg_points = np.asarray(points).copy()
    annotation['segmentation'] = [list(seg_points.flatten())]
    annotation['iscrowd'] = 0
    annotation['image_id'] = image_num + 1
    annotation['bbox'] = list(
        map(float, [
            points[0][0], points[0][1], points[1][0] - points[0][0], points[1][
                1] - points[0][1]
        ]))
    annotation['area'] = annotation['bbox'][2] * annotation['bbox'][3]
    annotation['category_id'] = label_to_num[label]
    annotation['id'] = object_num + 1
    return annotation


def annotations_polygon(height, width, points, label, image_num, object_num,
                        label_to_num):
    annotation = {}
    annotation['segmentation'] = [list(np.asarray(points).flatten())]
    annotation['iscrowd'] = 0
    annotation['image_id'] = image_num + 1
    annotation['bbox'] = list(map(float, get_bbox(height, width, points)))
    annotation['area'] = annotation['bbox'][2] * annotation['bbox'][3]
    annotation['category_id'] = label_to_num[label]
    annotation['id'] = object_num + 1
    return annotation


def get_bbox(height, width, points):
    polygons = points
    mask = np.zeros([height, width], dtype=np.uint8)
    mask = PIL.Image.fromarray(mask)
    xy = list(map(tuple, polygons))
    PIL.ImageDraw.Draw(mask).polygon(xy=xy, outline=1, fill=1)
    mask = np.array(mask, dtype=bool)
    index = np.argwhere(mask == 1)
    rows = index[:, 0]
    clos = index[:, 1]
    left_top_r = np.min(rows)
    left_top_c = np.min(clos)
    right_bottom_r = np.max(rows)
    right_bottom_c = np.max(clos)
    return [
        left_top_c, left_top_r, right_bottom_c - left_top_c,
        right_bottom_r - left_top_r
    ]


def deal_json(output_dir, input_dir, img_list):
    data_coco = {}
    images_list = []
    annotations_list = []
    image_num = -1
    object_num = -1
    # labels_list =[]
    
    for img_file in img_list:
        img = cv2.imread(osp.join('dataset/fish/Images', img_file))
        w = img.shape[1]
        h = img.shape[0]
        img_label = img_file.split('.')[0]
        label_file = osp.join('dataset/fish/Annotations', img_label + '.xml')
        # print('Generating dataset from:', label_file)
        image_num = image_num + 1

        xml_file = xmldom.parse(label_file)
        eles = xml_file.documentElement
        images_list.append(images_cityscape(image_num, img_file, w, h))
            
        for i in range(len(eles.getElementsByTagName('name'))):
            label = eles.getElementsByTagName('name')[i].firstChild.data
            # if label == 'starfish':
            #     continue
            object_num = object_num + 1
            # print(label)
            if label not in labels_list:
                # print(label)
                categories_list.append(categories(label, labels_list))
                labels_list.append(label)
                label_to_num[label] = len(labels_list)
            points = []
            xmin = int(eles.getElementsByTagName('xmin')[i].firstChild.data)
            ymin = int(eles.getElementsByTagName('ymin')[i].firstChild.data)
            xmax = int(eles.getElementsByTagName('xmax')[i].firstChild.data)
            ymax = int(eles.getElementsByTagName('ymax')[i].firstChild.data)
            # print(xmin,ymin,xmax,ymax)
            # if xmin > 2000:
            #     print(label_file)
            points.append([xmin, ymin])
            points.append([xmax, ymax])
            annotations_list.append(
                annotations_rectangle(points, label, image_num,
                                    object_num, label_to_num))                
    
    data_coco['images'] = images_list
    data_coco['categories'] = categories_list
    data_coco['annotations'] = annotations_list
    # print(labels_list)
    return data_coco


import os.path as osp
import glob
import os
import shutil
train_img = 'dataset/fish/Images'
train_box = 'dataset/fish/Annotations'

# Allocate the dataset.
total_num = len(glob.glob(osp.join(train_img, '*.jpg')))
    
train_num = int(total_num * 0.8)
os.makedirs('data/cocome' + '/train')

val_num = total_num - train_num
os.makedirs('data/cocome' + '/val')

count = 1
train_list = []
val_list = []
for img_name in os.listdir(train_img):
    if count <= train_num:
        if osp.exists('data/cocome' + '/train/'):
            shutil.copyfile(
                osp.join(train_img, img_name),
                osp.join('data/cocome' + '/train/', img_name))
            train_list.append(img_name)
    else:
        if count <= train_num + val_num:
            if osp.exists('data/cocome' + '/val/'):
                shutil.copyfile(
                    osp.join(train_img, img_name),
                    osp.join('data/cocome' + '/val/', img_name))
            val_list.append(img_name)
    count = count + 1

if not os.path.exists('data/cocome' + '/annotations'):
    os.makedirs('data/cocome' + '/annotations')
train_data_coco = deal_json(
        'data/cocome' + '/train', train_img ,train_list)
train_json_path = osp.join('data/cocome' + '/annotations', 'instance_train.json')
json.dump(
        train_data_coco,
        open(train_json_path, 'w'),
        indent=4,
        cls=MyEncoder)
            
val_data_coco = deal_json('data/cocome' + '/val', train_img, val_list)
val_json_path = osp.join('data/cocome' + '/annotations', 'instance_val.json')
json.dump(val_data_coco, open(val_json_path, 'w'), indent=4, cls=MyEncoder)

4.2.3 将转换后的数据集移动至dataset/fish

# 3.将转换后的数据集移动至dataset/fish
!mv data/cocome/* dataset/fish/

4.2.4 将自己的数据集的路径进行修改和配置

metric: COCO
num_classes: 5

TrainDataset:
  !COCODataSet
    image_dir: train
    anno_path: annotations/instance_train.json
    dataset_dir: dataset/fish
    data_fields: ['image', 'gt_bbox', 'gt_class', 'is_crowd']

EvalDataset:
  !COCODataSet
    image_dir: val
    anno_path: annotations/instance_val.json
    dataset_dir: dataset/fish

TestDataset:
  !ImageFolder
    anno_path: label_list.txt # also support txt (like VOC's label_list.txt)
    dataset_dir: dataset/fish # if set, anno_path will be 'dataset_dir/anno_path'

5 模型训练

使用 PP-YOLOE+ 进行训练:

在./configs/ppyoloe/ppyoloe_plus_crn_s_80e_coco.yml中提供了基于PP-YOLOE+训练该场景的配置，训练脚本如下：

!python tools/train.py -c configs/ppyoloe/ppyoloe_plus_crn_s_80e_coco.yml -o LearningRate.base_lr=0.000125 -o snapshot_epoch=1 -o worker_num=1 --eval

6 模型评估

在训练模型以后，我们可以通过运行评估命令来得到模型的精度，以确认训练的效果。评估可以参考以下命令执行。

这里使用了我们已经训练好的模型。如希望使用自己训练的模型，请对应将weights=后的值更改为对应模型.pdparams文件的存储路径。

# 模型评估
!python tools/eval.py -c configs/ppyoloe/ppyoloe_plus_crn_s_80e_coco.yml -o weights=output/ppyoloe_plus_crn_s_80e_coco/best_model.pdparams

Warning: import ppdet from source directory without installing, run 'python setup.py install' to install ppdet firstly
W1016 22:54:41.591436 22455 gpu_resources.cc:61] Please NOTE: device: 0, GPU Compute Capability: 7.0, Driver API Version: 11.2, Runtime API Version: 11.2
W1016 22:54:41.598824 22455 gpu_resources.cc:91] device: 0, cuDNN Version: 8.2.
loading annotations into memory...
Done (t=0.00s)
creating index...
index created!
[10/16 22:54:43] ppdet.utils.checkpoint INFO: Finish loading model weights: output/ppyoloe_plus_crn_s_80e_coco/best_model.pdparams
[10/16 22:54:44] ppdet.engine INFO: Eval iter: 0
[10/16 22:54:55] ppdet.metrics.metrics INFO: The bbox result is saved to bbox.json.
loading annotations into memory...
Done (t=0.01s)
creating index...
index created!
[10/16 22:54:55] ppdet.metrics.coco_utils INFO: Start evaluate...
Loading and preparing results...
DONE (t=0.50s)
creating index...
index created!
Running per image evaluation...
Evaluate annotation type *bbox*
DONE (t=3.09s).
Accumulating evaluation results...
DONE (t=0.43s).
 Average Precision  (AP) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.278
 Average Precision  (AP) @[ IoU=0.50      | area=   all | maxDets=100 ] = 0.541
 Average Precision  (AP) @[ IoU=0.75      | area=   all | maxDets=100 ] = 0.267
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.237
 Average Precision  (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.275
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.304
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=  1 ] = 0.160
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets= 10 ] = 0.436
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.543
 Average Recall     (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.514
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.509
 Average Recall     (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.572
[10/16 22:54:59] ppdet.engine INFO: Total sample number: 200, averge FPS: 19.422192298181745

7 模型预测

这里我们将训练好的模型对着整个数据集的验证集来一番批量预测，这些预测结果会记录在VisualDL中，可以很方便地与原图对比，观察预测效果。

!python tools/infer.py -c configs/ppyoloe/ppyoloe_plus_crn_s_80e_coco.yml -o weights=output/ppyoloe_plus_crn_s_80e_coco/best_model --infer_dir ./dataset/fish/val --use_vdl=True --vdl_log_dir=./output/image --output_dir ./output/results

8 模型导出

.pdparams只包括了模型的参数数据，实际部署还需要执行导出步骤。导出步骤可以参考下面列举的步骤：

注意，这里使用了我们已经训练好的模型。如希望使用自己训练的模型，请对应将weights=后的值更改为对应模型.pdparams文件的存储路径。如果没有指定–output_dir，那么导出的模型将默认存储在 output_inference/ 路径下。

!python tools/export_model.py -c configs/ppyoloe/ppyoloe_plus_crn_s_80e_coco.yml -o weights=output/ppyoloe_plus_crn_s_80e_coco/best_model.pdparams

至此，我们就完成了水底生物目标检测模型的从训练到导出的过程。接下来，看看该模型使用Paddle Inference部署时的具体性能表现。

9 模型速度测试

!python deploy/python/infer.py --model_dir=output_inference/ppyoloe_plus_crn_s_80e_coco --image_file=dataset/fish/val/c000418.jpg --run_mode=paddle --device=gpu

-----------  Running Arguments -----------
action_file: None
batch_size: 1
camera_id: -1
combine_method: nms
cpu_threads: 1
device: gpu
enable_mkldnn: False
enable_mkldnn_bfloat16: False
image_dir: None
image_file: dataset/fish/val/c000418.jpg
match_metric: ios
match_threshold: 0.6
model_dir: output_inference/ppyoloe_plus_crn_s_80e_coco
output_dir: output
overlap_ratio: [0.25, 0.25]
random_pad: False
reid_batch_size: 50
reid_model_dir: None
run_benchmark: False
run_mode: paddle
save_images: True
save_mot_txt_per_img: False
save_mot_txts: False
save_results: False
scaled: False
slice_infer: False
slice_size: [640, 640]
threshold: 0.5
tracker_config: None
trt_calib_mode: False
trt_max_shape: 1280
trt_min_shape: 1
trt_opt_shape: 640
use_coco_category: False
use_dark: True
use_gpu: False
video_file: None
window_size: 50
------------------------------------------
-----------  Model Configuration -----------
Model Arch: YOLO
Transform Order: 
--transform op: Resize
--transform op: NormalizeImage
--transform op: Permute
--------------------------------------------
class_id:0, confidence:0.7396, left_top:[299.31,0.93],right_bottom:[331.32,32.59]
class_id:0, confidence:0.7280, left_top:[293.39,100.21],right_bottom:[332.52,138.38]
class_id:0, confidence:0.6212, left_top:[528.09,-1.20],right_bottom:[566.73,20.30]
class_id:0, confidence:0.5986, left_top:[71.87,-0.63],right_bottom:[111.20,26.91]
class_id:0, confidence:0.5766, left_top:[120.02,101.65],right_bottom:[181.98,152.03]
class_id:0, confidence:0.5554, left_top:[405.12,215.59],right_bottom:[458.40,272.60]
class_id:0, confidence:0.5486, left_top:[476.20,49.54],right_bottom:[511.81,85.39]
class_id:1, confidence:0.6370, left_top:[446.48,281.86],right_bottom:[504.50,342.79]
class_id:1, confidence:0.5986, left_top:[450.84,89.30],right_bottom:[494.58,123.99]
class_id:1, confidence:0.5715, left_top:[510.54,89.15],right_bottom:[570.02,138.44]
class_id:1, confidence:0.5151, left_top:[512.79,94.07],right_bottom:[562.67,133.47]
save result to: output/c000418.jpg
Test iter 0
------------------ Inference Time Info ----------------------
total_time(ms): 1072.8, img_num: 1
average latency time(ms): 1072.80, QPS: 0.932140
preprocess_time(ms): 18.20, inference_time(ms): 1054.50, postprocess_time(ms): 0.10

在这里插入图片描述