一.姿态估计（使用的是Intel RealSense D435i深度相机）

该功能的实现代码在 aruco _mark目录下的detect_aruco_dynamic_realsense_pub.py
在这里插入图片描述

1.realsense 的前期驱动和库的安装配置

需要去安装官方的可视化界面和SDK,我基于的OS是unbuntu16.04，安装驱动和配置可能是漫长的解决报错的过程。注：必须要使用USB3.0以上的接口才能承受大数据流的传输。
安装可参考：https://github.com/IntelRealSense/librealsense/blob/master/doc/distribution_windows.md
如果在终端输入
realsense-viewer
SDK如果启动代表安装成功。如下图所示，是我打开了左侧按钮的所有输入流。
在这里插入图片描述

2.如果要调用realsense运行程序的话，python还需要安装pyrealsense2库。

2.1 方便快捷的方法可以直接pip安装
pip install pyrealsense2
2.2 intel官方还提供了source安装的步骤，如果之前安装过 librealsense库的话，在/librealsense/warpper/python/ 目录下。
更新&升级
sudo apt-get update && sudo apt-get upgrade
安装python3的开发文件
sudo apt-get install python3 python3-dev
运行cmake

mkdir build
cd build
cmake ../ -DBUILD_PYTHON_BINDINGS=bool:true

配置PYTHONPATH 环境路径
export PYTHONPATH=$PYTHONPATH:/usr/local/lib

2.3验证pyrealsense2是否成功

官网提供了样例可以测试，使用opencv显示深度和RGB图像。
网址：https://github.com/IntelRealSense/librealsense/blob/development/wrappers/python/examples/opencv_viewer_example.py

3. aruco_mask

姿态估计的基础是找到现实世界和图像投影之间的对应点。
使用aruco_mask来实现。给定一个可以看见ArUco marker的图像，检测程序应当返回检测到的marker的列表。每个检测到的marker包括：
3.1.图像四个角的位置（按照原始的顺序）2.marker的Id
aruco_mask可以使用c++代码生成，或者通过https://chev.me/arucogen/直接生成。
3.2.实现相机的Pose检测，首先需要知道相机的校准参数，即相机内参（mtx）和畸变系数(dist)，事先需要基于标定板做相机的标定，获取相机参数。我是用的相机参数在aruco_mark/detect_aruco_dynamic_realsense_pub.py 里的mtx和dist矩阵，需要利用这两个矩阵参数来实现pose检测。
3.3 其中使用aruco.detectMarkers()函数可以检测到marker，输入为：灰度图，aruco标定板类型，aruco参数；输出ID和标志板的4个角点坐标。分别是corners，ids, rejectedImgPoints。

corners, ids, rejectedImgPoints = aruco.detectMarkers(gray,aruco_dict,parameters=parameters)

因为realsense是放在机器人上的，所以还需要把相机坐标系转化为机器人坐标系。

data0[0] = pos_z_mean  
data0[1] = -pos_x_mean
data0[2] = -pos_y_mean

为了降低误差，采用均值滤波的方法，输出位姿x,y,z的均值。

def mean_filter(pos_x_list,pos_y_list,pos_z_list):
    num_pos = len(pos_x_list)
    pos_x_num = 0
    pos_y_num = 0
    pos_z_num = 0
    for i in range(num_pos):
        pos_x_num = pos_x_num + pos_x_list[i]
        pos_y_num = pos_y_num + pos_y_list[i]
        pos_z_num = pos_z_num + pos_z_list[i]
    pos_x_mean = pos_x_num/num_pos
    pos_y_mean = pos_y_num / num_pos
    pos_z_mean = pos_z_num / num_pos
    return pos_x_mean,pos_y_mean,pos_z_mean

主函数还包括 socket通信模块（其中涉及了与c++通信，是使用struct模块，通过pack将某些特定的结构体类型打包成二进制流的字符串然后再网络传输），显示模块等。使用realsense相机测试我笔记本上生成的aruco_mask效果如下，坐标系统处在marker重心，Z坐标指向mark外部，坐标的颜色为,X:红色，Y：绿色，Z：蓝色。id是在左上角显示。第二张是我手持打印的aruco_mask效果。
在这里插入图片描述

二、基于展品的目标检测(在TX2实现)

demo：Video link
由于检测最终要求在Xavier上运行的，之前用在实验室的TX2配置了tensorflow，所以使用是yolov3的tensorflow的复现版本。
reference:https://github.com/YunYang1994/tensorflow-yolov3
使用labelimg 制作数据集，包括训练集，测试集，验证集，可搜索学习怎么使用labelimg制作适用于yolo的数据集。

绘制中心线

网络训练（训练环境是cuda10.0）的到相应的权重，使用权重对场景进行预测。一帧的结果图如下。
图中的红线是我用opencv根据图片尺寸（1920*1080）画的中心点交线，为了直观的显示中间点。为后面计算展品据中心偏移量做准备。我是根据读代码，找到了原作者在utils/plot_utils.py下，绘制bounding_box的代码区，使用cv2.line()函数绘制了红线。

if label:
        tf = max(tl - 1, 1)  # font thickness
        t_size = cv2.getTextSize(label, 0, fontScale=float(tl) / 3, thickness=tf)[0]
        c2 = c1[0] + t_size[0], c1[1] - t_size[1] - 3
        cv2.rectangle(img, c1, c2, color, -1)  # filled
        cv2.putText(img, label, (c1[0], c1[1] - 2), 0, float(tl) / 3, [0, 0, 0], thickness=tf, lineType=cv2.LINE_AA)
        ptStart = (960, 0)
        ptEnd = (960, 1080)
        point_color = (0, 0, 255) # BGR
        thickness = 1
        lineType = 8
        cv2.line(img, ptStart, ptEnd, point_color, thickness, lineType)
        cv2.line(img, (0,540), (1920,540), point_color, thickness, lineType)

在这里插入图片描述

视频检测

基于test_single_image.py，原作者是对单张图片的检测，我使用opencv的 cv2.VideoCapture读取视频，cv2.VideoCapture不仅可以读取本地视频还可以基于rtsp读取网络摄像头的输入流（功能强大）。
使用while 循环不断地读取图片帧并输入yolov3网络处理，网络输出的预测图片以视频流的形式显示。

计算中心点的偏移距离

根据相机偏离重心的距离，输出控制机器人的运动，主要目的是机器人识别出距离中心点最近的展品，并进行后续的操作（靠近，讲解等），把x,y方向的偏移距离显示在视频的左上方。
在这里插入图片描述

if object_center[0] > 960 and object_center[1]>540:
            x_offset = object_center[0] - 960
            y_offset = object_center[1] - 540
            print('横向偏差：', x_offset, '纵向偏差', y_offset, '镜头偏左上 请右下移动')
            img_ori = cv2.putText(img_ori, 'x_offset:' + str(x_offset), (50, 100), cv2.FONT_HERSHEY_COMPLEX, 1.2, (0, 0, 255), 3)
            img_ori = cv2.putText(img_ori, 'y_offset:' + str(y_offset), (50, 150), cv2.FONT_HERSHEY_COMPLEX, 1.2, (0, 0, 255), 3)
        elif object_center[0] < 960 and object_center[1] <540:
            x_offset = 960 - object_center[0]
            y_offset = 540 - object_center[1]
            print('横向偏差：', x_offset, '纵向偏差', y_offset, '镜头偏右下 请左上移动')
            img_ori = cv2.putText(img_ori, 'x_offset:' + str(x_offset), (50, 100), cv2.FONT_HERSHEY_COMPLEX, 1.2, (0, 0, 255), 3)
            img_ori = cv2.putText(img_ori, 'y_offset:' + str(y_offset), (50, 150), cv2.FONT_HERSHEY_COMPLEX, 1.2, (0, 0, 255), 3)
        elif object_center[0] >960 and object_center[1] < 540:
            x_offset = object_center[0] - 960
            y_offset = 540 - object_center[1]
            print('横向偏差：', x_offset, '纵向偏差', y_offset, '镜头偏左下 请右上移动')
            img_ori = cv2.putText(img_ori, 'x_offset:' + str(x_offset), (50, 100), cv2.FONT_HERSHEY_COMPLEX, 1.2, (0, 0, 255), 3)
            img_ori = cv2.putText(img_ori, 'y_offset:' + str(y_offset), (50, 150), cv2.FONT_HERSHEY_COMPLEX, 1.2, (0, 0, 255), 3)
        elif object_center[0] < 960 and object_center[1] > 540:
            x_offset = 960 - object_center[0]
            y_offset = object_center[1] - 540
            print('横向偏差：', x_offset, '纵向偏差', y_offset, '镜头偏右上 请左下移动')
            img_ori = cv2.putText(img_ori, 'x_offset:' + str(x_offset), (50, 100), cv2.FONT_HERSHEY_COMPLEX, 1.2, (0, 0, 255), 3)
            img_ori = cv2.putText(img_ori, 'y_offset:' + str(y_offset), (50, 150), cv2.FONT_HERSHEY_COMPLEX, 1.2, (0, 0, 255), 3)
        else:
            print('镜头已经正向')
            img_ori = cv2.putText(img_ori, 'Sucessful!', (50, 100), cv2.FONT_HERSHEY_COMPLEX, 1.2, (0, 0, 255), 3)
        print(position,classes)