目标檢測是計算機視覺任務中的一個重要研究方向,其用于解決對數位圖像中特定種類的可視目标執行個體的檢測問題。目标檢測作為計算機視覺的根本性問題之一,是其他諸多計算機視覺任務,例如圖像描述生成,執行個體分割和目标跟蹤的基礎以及前提。而在解決此類問題時,我們常常需要使用自己的腳本或者利用标注工具生成資料集,資料集格式往往會多種多樣,是以對于目标檢測任務而言,為了更好地相容訓練,大多數目标檢測模型架構會預設支援幾種常用的資料集标注格式,常見的分别是COCO,Pascal VOC,YOLO等等。本文主要介紹上述幾種資料集格式以及我寫的Python腳本(一般需要根據實際情況再改改)。
1. COCO
1.1 COCO資料集格式
COCO(Common Objects in COtext)資料集,是一個大規模的,适用于目标檢測,圖像分割,Image Captioning任務的資料集,其标注格式是最常用的幾種格式之一。目前使用較多的是COCO2017資料集。其官網為COCO - Common Objects in Context (cocodataset.org)。
![](https://img.laitimes.com/img/9ZDMuAjOiMmIsIjOiQnIsICMyYTMvw1dvwlMvwlM3VWaWV2Zh1Wa-AjM2EzLcd3LcJzLcJzdllmVldWYtl2Pn5GcuAzY3UWO4MGZ3Q2N1YjNwgzYlN2MwYTMjlzMzkjM0ADZvwlN4YTM5MjNtUGall3LcVmdhNXLwRHdo9CXt92YucWbpRWdvx2Yx5yazF2Lc9CX6MHc0RHaiojIsJye.png)
COCO資料集主要包含圖像(jpg或者png等等)和标注檔案(json),其資料集格式如下(代表檔案夾):
/
-coco/
|-train2017/
|-1.jpg
|-2.jpg
|-val2017/
|-3.jpg
|-4.jpg
|-test2017/
|-5.jpg
|-6.jpg
|-annotations/
|-instances_train2017.json
|-instances_val2017.json
|-*.json
複制
train2017
以及這兩個檔案夾中存儲的是訓練集和驗證集的圖像,而檔案夾中存儲的是測試集的資訊,可以隻是圖像,也可以包含标注,一般是單獨使用的。
val2017test2017
annotations
檔案夾中的檔案就是标注檔案,如果你有檔案,通常需要轉換成格式,其格式如下(更詳細的可以參考官網):
xmljson
{
"info": info,
"images": [image], //清單
"annotations": [annotation], //清單
"categories": [category], //清單
"licenses": [license], //清單
}
複制
其中為整個資料集的資訊,包括年份,版本,描述等等資訊,如果隻是完成訓練任務,其實不太重要,如下所示:
info
//對于訓練,不是那麼的重要
info{
"year": int,
"version": str,
"description": str,
"contributor": str,
"url": str,
"date_created": datetime,
}
複制
其中的為圖像的基本資訊,包括序号,寬高,檔案名等等資訊,其中的序号()需要和後面的中的标注所屬圖檔序号對應如下所示:
imageidannotations
image{
"id": int, //必要
"width": int, //必要
"height": int, //必要
"file_name": str, //必要
"license": int,
"flickr_url": str,
"coco_url": str,
"date_captured": datetime,
}
複制
其中的是最重要的标注資訊,包括序号,所屬圖像序号,類别序号等等資訊,如下所示:
annotation
annotation{
"id": int, //标注id
"image_id": int, //所屬圖像id
"category_id": int, //類别id
"segmentation": RLE or [polygon], //圖像分割标注
"area": float, //區域面積
"bbox": [x,y,width,height], //目标框左上角坐标以及寬高
"iscrowd": 0 or 1, //是否密集
}
複制
其中的代表類别資訊,包括父類别,類别序号以及類别名稱,如下所示:
category
category{
"id": int, //類别序号
"name": str, //類别名稱
"supercategory": str, //父類别
}
複制
其中的代表資料集的協定許可資訊,包括序号,協定名稱以及連結資訊,如下所示:
license
//對于訓練,不重要
license{
"id": int,
"name": str,
"url": str,
}
複制
接下來,我們來看一個簡單的示例:
{
"info": {略}, "images": [{"id": 1, "file_name": "1.jpg", "height": 334, "width": 500}, {"id": 2, "file_name": "2.jpg", "height": 445, "width": 556}], "annotations": [{"id": 1, "area": 40448, "iscrowd": 0, "image_id": 1, "bbox": [246, 61, 128, 316], "category_id": 3, "segmentation": []}, {"id": 2, "area": 40448, "iscrowd": 0, "image_id": 1, "bbox": [246, 61, 128, 316], "category_id": 2, "segmentation": []}, {"id": 3, "area": 40448, "iscrowd": 0, "image_id": 2, "bbox": [246, 61, 128, 316], "category_id": 1, "segmentation": []}], "categories": [{"supercategory": "none", "id": 1, "name": "liner"},{"supercategory": "none", "id": 2, "name": "containership"},{"supercategory": "none", "id": 3, "name": "bulkcarrier"}], "licenses": [{略}]
}
複制
1.2 COCO轉換腳本
Python轉換腳本
如下所示,需要準備和标注檔案:
圖像xml
# -*- coding: utf-8 -*-
# @Author : justlovesmile
# @Date : 2021/9/8 15:36
import os, random, json
import shutil as sh
from tqdm.auto import tqdm
import xml.etree.ElementTree as xmlET
def mkdir(path):
if not os.path.exists(path):
os.makedirs(path)
return True
else:
print(f"The path ({path}) already exists.")
return False
def readxml(file):
tree = xmlET.parse(file)
#圖檔尺寸字段
size = tree.find('size')
width = int(size.find('width').text)
height = int(size.find('height').text)
#目标字段
objs = tree.findall('object')
bndbox = []
for obj in objs:
label = obj.find("name").text
bnd = obj.find("bndbox")
xmin = int(bnd.find("xmin").text)
ymin = int(bnd.find("ymin").text)
xmax = int(bnd.find("xmax").text)
ymax = int(bnd.find("ymax").text)
bbox = [xmin, ymin, xmax, ymax, label]
bndbox.append(bbox)
return [[width, height], bndbox]
def tococo(xml_root, image_root, output_root,classes={},errorId=[],train_percent=0.9):
# assert
assert train_percent<=1 and len(classes)>0
# define the root path
train_root = os.path.join(output_root, "train2017")
val_root = os.path.join(output_root, "val2017")
ann_root = os.path.join(output_root, "annotations")
# initialize train and val dict
train_content = {
"images": [], # {"file_name": "09780.jpg", "height": 334, "width": 500, "id": 9780}
"annotations": [],# {"area": 40448, "iscrowd": 0, "image_id": 1, "bbox": [246, 61, 128, 316], "category_id": 5, "id": 1, "segmentation": []}
"categories": [] # {"supercategory": "none", "id": 1, "name": "liner"}
}
val_content = {
"images": [], # {"file_name": "09780.jpg", "height": 334, "width": 500, "id": 9780}
"annotations": [],# {"area": 40448, "iscrowd": 0, "image_id": 1, "bbox": [246, 61, 128, 316], "category_id": 5, "id": 1, "segmentation": []}
"categories": [] # {"supercategory": "none", "id": 1, "name": "liner"}
}
train_json = 'instances_train2017.json'
val_json = 'instances_val2017.json'
# divide the trainset and valset
images = os.listdir(image_root)
total_num = len(images)
train_percent = train_percent
train_num = int(total_num * train_percent)
train_file = sorted(random.sample(images, train_num))
if mkdir(output_root):
if mkdir(train_root) and mkdir(val_root) and mkdir(ann_root):
idx1, idx2, dx1, dx2 = 0, 0, 0, 0
for file in tqdm(images):
name=os.path.splitext(os.path.basename(file))[0]
if name not in errorId:
res = readxml(os.path.join(xml_root, name + '.xml'))
if file in train_file:
idx1 += 1
sh.copy(os.path.join(image_root, file), train_root)
train_content['images'].append(
{"file_name": file, "width": res[0][0], "height": res[0][1], "id": idx1})
for b in res[1]:
dx1 += 1
x = b[0]
y = b[1]
w = b[2] - b[0]
h = b[3] - b[1]
train_content['annotations'].append(
{"area": w * h, "iscrowd": 0, "image_id": idx1, "bbox": [x, y, w, h],
"category_id": classes[b[4]], "id": dx1, "segmentation": []})
else:
idx2 += 1
sh.copy(os.path.join(image_root, file), val_root)
val_content['images'].append(
{"file_name": file, "width": res[0][0], "height": res[0][1], "id": idx2})
for b in res[1]:
dx2 += 1
x = b[0]
y = b[1]
w = b[2] - b[0]
h = b[3] - b[1]
val_content['annotations'].append(
{"area": w * h, "iscrowd": 0, "image_id": idx2, "bbox": [x, y, w, h],
"category_id": classes[b[4]], "id": dx2, "segmentation": []})
for i, j in classes.items():
train_content['categories'].append({"supercategory": "none", "id": j, "name": i})
val_content['categories'].append({"supercategory": "none", "id": j, "name": i})
with open(os.path.join(ann_root, train_json), 'w') as f:
json.dump(train_content, f)
with open(os.path.join(ann_root, val_json), 'w') as f:
json.dump(val_content, f)
print("Number of Train Images:", len(os.listdir(train_root)))
print("Number of Val Images:", len(os.listdir(val_root)))
def test():
box_root = "E:/MyProject/Dataset/hwtest/annotations" #xml檔案夾
image_root = "E:/MyProject/Dataset/hwtest/images" #image檔案夾
output_root = "E:/MyProject/Dataset/coco" #輸出檔案夾
classes = {"liner": 0,"bulk carrier": 1,"warship": 2,"sailboat": 3,"canoe": 4,"container ship": 5,"fishing boat": 6} #類别字典
errorId = [] #髒資料id
train_percent = 0.9 #訓練集和驗證集比例
tococo(box_root, image_root, output_root,classes=classes,errorId=errorId,train_percent=train_percent)
if __name__ == "__main__":
test()
複制
複制
2. VOC
2.1 VOC資料集格式
VOC(Visual Object Classes)資料集來源于PASCAL VOC挑戰賽,其主要任務有 、、、、。其官網為The PASCAL Visual Object Classes Homepage (ox.ac.uk)。其主要資料集有VOC2007以及VOC2012。
Object ClassificationObject DetectionObject SegmentationHuman LayoutAction Classification
VOC資料集主要包含圖像(jpg或者png等等)和标注檔案(xml),其資料集格式如下(代表檔案夾):
/
-VOC/
|-JPEGImages/
|-1.jpg
|-2.jpg
|-Annotations/
|-1.xml
|-2.xml
|-ImageSets/
|-Layout/
|-*.txt
|-Main/
|-train.txt
|-val.txt
|-trainval.txt
|-test.txt
|-Segmentation/
|-*.txt
|-Action/
|-*.txt
|-SegmentationClass/
|-SegmentationObject/
複制
其中對于目标檢測任務而言,最常用的以及必須的檔案夾包括:,,。
JPEGImagesAnnotationsImageSets/Main
JPEGImages
裡存放的是圖像,而裡存放的是标注檔案,檔案内容如下:
Annotationsxml
<annotation>
<folder>VOC</folder> # 圖像所在檔案夾
<filename>000032.jpg</filename> # 圖像檔案名
<source> # 圖像源
<database>The VOC Database</database>
<annotation>PASCAL VOC</annotation>
<image>flickr</image>
</source>
<size> # 圖像尺寸資訊
<width>500</width> # 圖像寬度
<height>281</height> # 圖像高度
<depth>3</depth> # 圖像通道數
</size>
<segmented>0</segmented> # 圖像是否用于分割,0代表不适用,對目标檢測而言沒關系
<object> # 一個目标對象的資訊
<name>aeroplane</name> # 目标的類别名
<pose>Frontal</pose> # 拍攝角度,若無一般為Unspecified
<truncated>0</truncated> # 是否被截斷,0表示完整未截斷
<difficult>0</difficult> # 是否難以識别,0表示不難識别
<bndbox> # 邊界框資訊
<xmin>104</xmin> # 左上角x
<ymin>78</ymin> # 左上角y
<xmax>375</xmax> # 右下角x
<ymax>183</ymax> # 右下角y
</bndbox>
</object>
# 下面是其他目标的資訊,這裡略掉
<object>
其他object資訊,這裡省略
</object>
</annotation>
複制
2.2 VOC轉換腳本
下面這個腳本,隻适用于有圖像和xml檔案的情況下,coco轉voc格式以後有需要再寫:
# -*- coding: utf-8 -*-
# @Author : justlovesmile
# @Date : 2021/9/8 21:01
import os,random
from tqdm.auto import tqdm
import shutil as sh
def mkdir(path):
if not os.path.exists(path):
os.mkdir(path)
return True
else:
print(f"The path ({path}) already exists.")
return False
def tovoc(xmlroot,imgroot,saveroot,errorId=[],classes={},tvp=1.0,trp=0.9):
'''
參數:
root:資料集存放根目錄
功能:
加載資料,并儲存為VOC格式
加載後的格式:
VOC/
Annotations/
- **.xml
JPEGImages/
- **.jpg
ImageSets/
Main/
- train.txt
- test.txt
- val.txt
- trainval.txt
'''
# assert
assert len(classes)>0
# init path
VOC = saveroot
ann_path = os.path.join(VOC, 'Annotations')
img_path = os.path.join(VOC,'JPEGImages')
set_path = os.path.join(VOC,'ImageSets')
txt_path = os.path.join(set_path,'Main')
# mkdirs
if mkdir(VOC):
if mkdir(ann_path) and mkdir(img_path) and mkdir(set_path):
mkdir(txt_path)
images = os.listdir(imgroot)
list_index = range(len(images))
#test and trainval set
trainval_percent = tvp
train_percent = trp
val_percent = 1 - train_percent if train_percent<1 else 0.1
total_num = len(images)
trainval_num = int(total_num*trainval_percent)
train_num = int(trainval_num*train_percent)
val_num = int(trainval_num*val_percent) if train_percent<1 else 0
trainval = random.sample(list_index,trainval_num)
train = random.sample(list_index,train_num)
val = random.sample(list_index,val_num)
for i in tqdm(list_index):
imgfile = images[i]
img_id = os.path.splitext(os.path.basename(imgfile))[0]
xmlfile = img_id+".xml"
sh.copy(os.path.join(imgroot,imgfile),os.path.join(img_path,imgfile))
sh.copy(os.path.join(xmlroot,xmlfile),os.path.join(ann_path,xmlfile))
if img_id not in errorId:
if i in trainval:
with open(os.path.join(txt_path,'trainval.txt'),'a') as f:
f.write(img_id+'\n')
if i in train:
with open(os.path.join(txt_path,'train.txt'),'a') as f:
f.write(img_id+'\n')
else:
with open(os.path.join(txt_path,'val.txt'),'a') as f:
f.write(img_id+'\n')
if train_percent==1 and i in val:
with open(os.path.join(txt_path,'val.txt'),'a') as f:
f.write(img_id+'\n')
else:
with open(os.path.join(txt_path,'test.txt'),'a') as f:
f.write(img_id+'\n')
# end
print("Dataset to VOC format finished!")
def test():
box_root = "E:/MyProject/Dataset/hwtest/annotations"
image_root = "E:/MyProject/Dataset/hwtest/images"
output_root = "E:/MyProject/Dataset/voc"
classes = {"liner": 0,"bulk carrier": 1,"warship": 2,"sailboat": 3,"canoe": 4,"container ship": 5,"fishing boat": 6}
errorId = []
train_percent = 0.9
tovoc(box_root,image_root,output_root,errorId,classes,trp=train_percent)
if __name__ == "__main__":
test()
複制
3. YOLO
3.1 YOLO資料集格式
YOLO
資料集格式的出現主要是為了訓練模型,其檔案格式沒有固定的要求,因為可以通過修改模型的配置檔案進行資料加載,唯一需要注意的是資料集的标注格式是将目标框的位置資訊進行歸一化處理(此處歸一化指的是除以圖檔寬和高),如下所示:
YOLOYOLO
{目标類别} {歸一化後的目标中心點x坐标} {歸一化後的目标中心點y坐标} {歸一化後的目标框寬度w} {歸一化後的目标框高度h}
複制
3.2 YOLO轉換腳本
Python
轉換腳本如下所示:
# -*- coding: utf-8 -*-
# @Author : justlovesmile
# @Date : 2021/9/8 20:28
import os
import random
from tqdm.auto import tqdm
import shutil as sh
try:
import xml.etree.cElementTree as et
except ImportError:
import xml.etree.ElementTree as et
def mkdir(path):
if not os.path.exists(path):
os.makedirs(path)
return True
else:
print(f"The path ({path}) already exists.")
return False
def xml2yolo(xmlpath,savepath,classes={}):
namemap = classes
#try:
# with open('classes_yolo.json','r') as f:
# namemap=json.load(f)
#except:
# pass
rt = et.parse(xmlpath).getroot()
w = int(rt.find("size").find("width").text)
h = int(rt.find("size").find("height").text)
with open(savepath, "w") as f:
for obj in rt.findall("object"):
name = obj.find("name").text
xmin = int(obj.find("bndbox").find("xmin").text)
ymin = int(obj.find("bndbox").find("ymin").text)
xmax = int(obj.find("bndbox").find("xmax").text)
ymax = int(obj.find("bndbox").find("ymax").text)
f.write(
f"{namemap[name]} {(xmin+xmax)/w/2.} {(ymin+ymax)/h/2.} {(xmax-xmin)/w} {(ymax-ymin)/h}"
+ "\n"
)
def trainval(xmlroot,imgroot,saveroot,errorId=[],classes={},tvp=1.0,trp=0.9):
# assert
assert tvp<=1.0 and trp <=1.0 and len(classes)>0
# create dirs
imglabel = ['images','labels']
trainvaltest = ['train','val','test']
mkdir(saveroot)
for r in imglabel:
mkdir(os.path.join(saveroot,r))
for s in trainvaltest:
mkdir(os.path.join(saveroot,r,s))
#train / val
trainval_percent = tvp
train_percent = trp
val_percent = 1 - train_percent if train_percent<1.0 else 0.15
total_img = os.listdir(imgroot)
num = len(total_img)
list_index = range(num)
tv = int(num * trainval_percent)
tr = int(tv * train_percent)
va = int(tv * val_percent)
trainval = random.sample(list_index, tv) # trainset and valset
train = random.sample(trainval, tr) # trainset
val = random.sample(trainval, va) #valset, use it only when train_percent = 1
print(f"trainval_percent:{trainval_percent},train_percent:{train_percent},val_percent:{val_percent}")
for i in tqdm(list_index):
name = total_img[i]
op = os.path.join(imgroot,name)
file_id = os.path.splitext(os.path.basename(name))[0]
if file_id not in errorId:
xmlp = os.path.join(xmlroot,file_id+'.xml')
if i in trainval:
# trainset and valset
if i in train:
sp = os.path.join(saveroot,"images","train",name)
xml2yolo(xmlp,os.path.join(saveroot,"labels","train",file_id+'.txt'),classes)
sh.copy(op,sp)
else:
sp = os.path.join(saveroot,"images","val",name)
xml2yolo(xmlp,os.path.join(saveroot,"labels","val",file_id+'.txt'),classes)
sh.copy(op,sp)
if (train_percent==1.0 and i in val):
sp = os.path.join(saveroot,"images","val",name)
xml2yolo(xmlp,os.path.join(saveroot,"labels","val",file_id+'.txt'),classes)
sh.copy(op,sp)
else:
# testset
sp = os.path.join(saveroot,"images","test",name)
xml2yolo(xmlp,os.path.join(saveroot,"labels","test",file_id+'.txt'),classes)
sh.copy(op,sp)
def maketxt(dir,saveroot,filename):
savetxt = os.path.join(saveroot,filename)
with open(savetxt,'w') as f:
for i in tqdm(os.listdir(dir)):
f.write(os.path.join(dir,i)+'\n')
def toyolo(xmlroot,imgroot,saveroot,errorId=[],classes={},tvp=1,train_percent=0.9):
# toyolo main function
trainval(xmlroot,imgroot,saveroot,errorId,classes,tvp,train_percent)
maketxt(os.path.join(saveroot,"images","train"),saveroot,"train.txt")
maketxt(os.path.join(saveroot,"images","val"),saveroot,"val.txt")
maketxt(os.path.join(saveroot,"images","test"),saveroot,"test.txt")
print("Dataset to yolo format success.")
def test():
box_root = "E:/MyProject/Dataset/hwtest/annotations"
image_root = "E:/MyProject/Dataset/hwtest/images"
output_root = "E:/MyProject/Dataset/yolo"
classes = {"liner": 0,"bulk carrier": 1,"warship": 2,"sailboat": 3,"canoe": 4,"container ship": 5,"fishing boat": 6}
errorId = []
train_percent = 0.9
toyolo(box_root,image_root,output_root,errorId,classes,train_percent=train_percent)
if __name__ == "__main__":
test()
複制
按照此腳本,将會在輸出檔案夾中生成以下内容:
-yolo/
|-images/
|-train/
|-1.jpg
|-2.jpg
|-test/
|-3.jpg
|-4.jpg
|-val/
|-5.jpg
|-6.jpg
|-labels/
|-train/
|-1.txt
|-2.txt
|-test/
|-3.txt
|-4.txt
|-val/
|-5.txt
|-6.txt
|-train.txt
|-test.txt
|-val.txt
複制