最近由于项目需要想把Object Detection数据集中的bounding box抠出来(这里其实是想偷个懒,本来应该是将segmentation扣取,后面发现bounding box不行,只能换成segmentation,这是后话了) 但是发现这种通用性这么广的数据集居然网上没用现成直接可用的代码,没办法,只能自行写。 好了,废话不多说,直接上代码吧
import os
import cv2
import xml.etree.ElementTree as ET
from tqdm import tqdm
classes = ["aeroplane", "bicycle", "bird", "boat", "bottle", "bus", "car", "cat",
"chair", "cow", "diningtable", "dog", "horse", "motorbike", "person",
"pottedplant", "sheep", "sofa", "train", "tvmonitor"]
Ori_Path = './VOC2012/'
Save_Path = './Cut_image/'
def make_class_Dir():
for cls in classes:
path = Save_Path + cls
if not os.path.exists(path):
os.mkdir(path)
def image_cut(path, bbox, save_path):
img = cv2.imread(path, flags=cv2.IMREAD_COLOR)
cut_image = img[bbox[1]:bbox[1]+bbox[3], bbox[0]:bbox[0]+bbox[2]]
cv2.imwrite(save_path, cut_image)
def Get_bbox_from_xml(xml_path):
global xmin, ymin, xmax, ymax
etree = ET.parse(xml_path)
root = etree.getroot()
for filename in root.iter('filename'):
img_name = filename.text
img_path = Ori_Path + 'JPEGImages/' + img_name
print(img_path)
for obj in root.iter('object'):
for cls in obj.iter('name'):
save_path = Save_Path + cls.text + '/' + img_name
for position in obj.iter(('xmin')):
xmin = int(float(position.text))
for position in obj.iter(('ymin')):
ymin = int(float(position.text))
for position in obj.iter(('xmax')):
xmax = int(float(position.text))
for position in obj.iter(('ymax')):
ymax = int(float(position.text))
image_cut(img_path, (xmin, ymin, xmax, ymax), save_path)
if __name__ == '__main__':
make_class_Dir()
这里的path表示VOC2012数据集中的放xml文件的目录
path = './VOC2012/Annotations/'
xml_files = os.listdir(path)
for xml in tqdm(xml_files):
xml_path = path + xml
Get_bbox_from_xml(xml_path)
直接运行上面的代码就可以了,注意一下,上面找xmin, ymin, xmax, ymax的地方有一个坑,就是VOC2012的数据集中不同年份标注的xmin, ymin, xmax, ymax顺序是不一样,注意不要直接想当然是xmin, ymin, xmax, ymax的顺序(这是我在网上看的别人其他人时发现的问题)。
这是2007开头xml的写法,是xmin、ymin、xmax、ymax的顺序
<object>
<name>tvmonitor</name>
<pose>Frontal</pose>
<truncated>0</truncated>
<difficult>0</difficult>
<bndbox>
<xmin>251</xmin>
<ymin>28</ymin>
<xmax>475</xmax>
<ymax>267</ymax>
</bndbox>
</object>
这是2009开头xml的写法,是xmax、xmin、ymax、ymin的顺序
<object>
<name>car</name>
<bndbox>
<xmax>458</xmax>
<xmin>260</xmin>
<ymax>92</ymax>
<ymin>29</ymin>
</bndbox>
<difficult>0</difficult>
<occluded>0</occluded>
<pose>Right</pose>
<truncated>0</truncated>
</object>
|