原文首發于微信公衆号「3D視覺工坊」:mask rcnn訓練自己的資料集
前言最近迷上了mask rcnn,也是由于自己工作需要吧,特意研究了其源代碼,并基于自己的資料進行訓練~
本部落格參考:https://blog.csdn.net/disiwei1012/article/details/79928679#commentsedit
實驗目的
哎~說多了都是淚,誰讓我是工科生呢?隻能檢測工件了。。。做不了高大上的東西了,哈哈
主要參考及工具
基于Mask RCNN開源項目:
https://github.com/matterport/Mask_RCNN
圖檔标記工具基于開源項目:https://github.com/wkentaro/labelme
訓練工具:
win10+GTX1060+cuda9.1+cudnn7+tensorflow-gpu-1.6.0+keras-2.1.6,140幅圖像,一共3類,1小時左右
有關labelme的使用可以參考:
https://blog.csdn.net/shwan_ma/article/details/77823281
有關Mask-RCNN和Faster RCNN算法可以參考:
https://blog.csdn.net/linolzhang/article/details/71774168
https://blog.csdn.net/lk123400/article/details/54343550
準備訓練資料集
這是我建立的四個檔案夾,下面一一道來~
1.pic這是訓練的圖像,一共700幅
2.json
這是通過labelme處理訓練圖像後生成的檔案
3.labelme_json這個是處理.json檔案後産生的資料,使用方法為labelme_json_to_dataset+空格+檔案名稱.json,這個前提是labelme要準确安裝并激活。但是這樣會産生一個問題,對多幅圖像這樣處理,太麻煩,在這裡提供一個工具,可以直接在.json檔案目錄下轉換所有的json檔案,連結: https://download.csdn.net/download/qq_29462849/10540381
4.cv2_mask檔案
由于labelme生成的掩碼标簽 label.png為16位存儲,opencv預設讀取8位,需要将16位轉8位,可通過C++程式轉化,代碼請參考這篇博文:http://blog.csdn.net/l297969586/article/details/79154150
一團黑,不過不要怕,正常的~
源代碼運作該代碼,需要安裝pycocotools,在windows下安裝該工具非常煩,有的可以輕松的安裝成功,有的重裝系統也很難成功,哎,都是坑~~關于Windows下安裝pycocotools請參考:https://blog.csdn.net/chixia1785/article/details/80040172
https://blog.csdn.net/gxiaoyaya/article/details/78363391
測試的源代碼
Github上開源的代碼,是基于ipynb的,我直接把它轉換成.py檔案,首先做個測試,基于coco資料集上訓練好的模型,可以調用攝像頭~~~
import os
import sys
import random
import math
import numpy as np
import http:// skimage.io
import matplotlib
import matplotlib.pyplot as plt
import cv2
import time
# Root directory of the project
ROOT_DIR = os.path.abspath(
"../")
# Import Mask RCNN
sys.path.append(ROOT_DIR) # To find local version of the library
from mrcnn import utils
import mrcnn.model as modellib
from mrcnn import visualize
# Import COCO config
sys.path.append(os.path.join(ROOT_DIR,
"samples/coco/")) # To find local version
import coco
# Directory to save logs
andtrained model
MODEL_DIR = os.path.join(ROOT_DIR,
"logs")
# Local path to trained weights file
COCO_MODEL_PATH = os.path.join(MODEL_DIR ,
"mask_rcnn_coco.h5")
# Download COCO trained weights from Releases
if needed if notos.path.exists(COCO_MODEL_PATH):
utils.download_trained_weights(COCO_MODEL_PATH)
print(
"cuiwei***********************")
# Directory of images to run detection on
IMAGE_DIR = os.path.join(ROOT_DIR,
"images" ) classInferenceConfig(coco.CocoConfig):
# Set batch size to 1 since we
'll be running inference on# one image at a time. Batch size = GPU_COUNT * IMAGES_PER_GPU
GPU_COUNT = 1
IMAGES_PER_GPU = 1
config = InferenceConfig()
config.display()
# Create model object in inference mode.
model = modellib.MaskRCNN(mode=
"inference", model_dir=MODEL_DIR, config=config)
# Load weights trained on MS-COCO
model.load_weights(COCO_MODEL_PATH, by_name=True)
# COCO Class names
# Index of the
classin the list is its ID. For example, to get ID of
# the teddy bear
class , use: class_names.index( 'teddy bear')
class_names = [
'BG' , 'person' , 'bicycle' , 'car' , 'motorcycle' , 'airplane' , 'bus' , 'train' , 'truck' , 'boat' , 'traffic light' , 'fire hydrant' , 'stop sign' , 'parking meter' , 'bench' , 'bird' , 'cat' , 'dog' , 'horse' , 'sheep' , 'cow' , 'elephant' , 'bear' , 'zebra' , 'giraffe' , 'backpack' , 'umbrella' , 'handbag' , 'tie' , 'suitcase' , 'frisbee' , 'skis' , 'snowboard' , 'sports ball' , 'kite' , 'baseball bat' , 'baseball glove' , 'skateboard' , 'surfboard' , 'tennis racket' , 'bottle' , 'wine glass' , 'cup' , 'fork' , 'knife' , 'spoon' , 'bowl' , 'banana' , 'apple' , 'sandwich' , 'orange' , 'broccoli' , 'carrot' , 'hot dog' , 'pizza' , 'donut' , 'cake' , 'chair' , 'couch' , 'potted plant' , 'bed' , 'dining table' , 'toilet' , 'tv' , 'laptop' , 'mouse' , 'remote' , 'keyboard' , 'cell phone' , 'microwave' , 'oven' , 'toaster' , 'sink' , 'refrigerator' , 'book' , 'clock' , 'vase' , 'scissors' , 'teddy bear' , 'hair drier' , 'toothbrush']
# Load a random image from the images folder
#file_names = next(os.walk(IMAGE_DIR))[2]
#image = skimage.io.imread(os.path.join(IMAGE_DIR, random.choice(file_names)))
cap = cv2.VideoCapture(0)
while(1):
# get a frame
ret, frame = cap.read()
# show a frame
start =time.clock()
results = model.detect([frame], verbose=1)
r = results[0]
#cv2.imshow(
"capture", frame)
visualize.display_instances(frame, r[
'rois' ], r[ 'masks' ], r[ 'class_ids'],
class_names, r[
'scores'])
end = time.clock()
print(end-start)
if cv2.waitKey(1) & 0xFF == ord( 'q' ): breakcap.release()
cv2.destroyAllWindows()
#image= cv2.imread(
"C:Users18301DesktopMask_RCNN-masterimages9.jpg")
## Run detection
#
#results = model.detect([image], verbose=1)
#
#print(end-start)
## Visualize results
#r = results[0]
#visualize.display_instances(image, r[
'rois' ], r[ 'masks' ], r[ 'class_ids'],
# class_names, r[
'scores' ])
關于訓練好的mask rcnn模型,可從此處下載下傳:
https://github.com/matterport/Mask_RCNN/releases,下載下傳好後,配置路徑即可
訓練資料源代碼
# -*- coding: utf-8 -*-
import os
import sys
import random
import math
import re
import time
import numpy as np
import cv2
import matplotlib
import matplotlib.pyplot as plt
import tensorflow as tf
from mrcnn.config import Config
#import utils
from mrcnn import model as modellib,utils
from mrcnn import visualize
import yaml
from mrcnn.model import log
from PIL import Image
#os.environ[
"CUDA_VISIBLE_DEVICES" ] = "0"# Root directory of the project
ROOT_DIR = os.getcwd()
#ROOT_DIR = os.path.abspath(
"../")
# Directory to save logs
andtrained model
MODEL_DIR = os.path.join(ROOT_DIR,
"logs")
iter_num=0
# Local path to trained weights file
COCO_MODEL_PATH = os.path.join(ROOT_DIR,
"mask_rcnn_coco.h5")
# Download COCO trained weights from Releases
if needed if notos.path.exists(COCO_MODEL_PATH):
utils.download_trained_weights(COCO_MODEL_PATH)
class ShapesConfig(Config): """Configuration for training on the toy shapes dataset. Derives from the base Config class andoverrides values specific
to the toy shapes dataset.
"""# Give the configuration a recognizable name
NAME =
"shapes" # Train on 1 GPU and8 images per GPU. We can put multiple images on each
# GPU because the images are small. Batch size is 8 (GPUs * images/GPU).
GPU_COUNT = 1
IMAGES_PER_GPU = 2
# Number of classes (including background)
NUM_CLASSES = 1 + 3 # background + 3 shapes
# Use small images
forfaster training. Set the limits of the small side
# the large side,
andthat determines the image shape.
IMAGE_MIN_DIM = 320
IMAGE_MAX_DIM = 384
# Use smaller anchors because our image
andobjects are small
RPN_ANCHOR_SCALES = (8 * 6, 16 * 6, 32 * 6, 64 * 6, 128 * 6) # anchor side in pixels
# Reduce training ROIs per image because the images are small
andhave
# few objects. Aim to allow ROI sampling to pick 33% positive ROIs.
TRAIN_ROIS_PER_IMAGE = 100
# Use a small epoch since the data is simple
STEPS_PER_EPOCH = 100
# use small validation steps since the epoch is small
VALIDATION_STEPS = 50
config = ShapesConfig()
config.display()
classDrugDataset(utils.Dataset):
# 得到該圖中有多少個執行個體(物體)
def get_obj_index(self, image):
n = np.max(image)
returnn
# 解析labelme中得到的yaml檔案,進而得到mask每一層對應的執行個體标簽
def from_yaml_get_class(self, image_id):
info = self.image_info[image_id]
with open(info[
'yaml_path']) as f:
temp = yaml.load(f.read())
labels = temp[
'label_names']
del labels[0]
returnlabels
# 重新寫draw_mask
def draw_mask(self, num_obj, mask, image,image_id):
#print(
"draw_mask-->",image_id)
#print(
"self.image_info",self.image_info)
info = self.image_info[image_id]
#print(
"info-->",info)
#print(
"info[width]----->" ,info[ 'width' ], "-info[height]--->" ,info[ 'height' ]) for index in range(num_obj): for i in range(info[ 'width' ]): for j in range(info[ 'height']):
#print(
"image_id-->" ,image_id, "-i--->" ,i, "-j--->",j)
#print(
"info[width]----->" ,info[ 'width' ], "-info[height]--->" ,info[ 'height'])
at_pixel = image.getpixel((i, j))
ifat_pixel == index + 1:
mask[j, i, index] = 1
returnmask
# 重新寫load_shapes,裡面包含自己的類别,可以任意添加
# 并在self.image_info資訊中添加了path、mask_path 、yaml_path
# yaml_pathdataset_root_path =
"/tongue_dateset/" # img_floder = dataset_root_path + "rgb" # mask_floder = dataset_root_path + "mask" # dataset_root_path = "/tongue_dateset/" def load_shapes(self, count, img_floder, mask_floder, imglist, dataset_root_path): """Generate the requested number of synthetic images.count: number of images to generate.
height, width: the size of the generated images.
"""# Add classes,可通過這種方式擴充多個物體
self.add_class(
"shapes" , 1, "tank") # 黑色素瘤
self.add_class(
"shapes" , 2, "triangle")
self.add_class(
"shapes" , 3, "white" ) fori in range(count):
# 擷取圖檔寬和高
filestr = imglist[i].split(
".")[0]
#print(imglist[i],
"-->" ,cv_img.shape[1], "--->",cv_img.shape[0])
#print(
"id-->" , i, " imglist[" , i, "]-->" , imglist[i], "filestr-->",filestr)
#filestr = filestr.split(
"_")[1]
mask_path = mask_floder +
"/" + filestr + ".png" yaml_path = dataset_root_path + "labelme_json/" + filestr + "_json/info.yaml" print(dataset_root_path + "labelme_json/" + filestr + "_json/img.png")
cv_img = cv2.imread(dataset_root_path +
"labelme_json/" + filestr + "_json/img.png")
self.add_image(
"shapes" , image_id=i, path=img_floder + "/"+ imglist[i],
width=cv_img.shape[1], height=cv_img.shape[0], mask_path=mask_path, yaml_path=yaml_path)
# 重寫load_mask
def load_mask(self, image_id):
"""Generate instance masks for shapes of the given image ID. """global iter_num
print(
"image_id",image_id)
info = self.image_info[image_id]
count = 1 # number of object
img = Image.open(info[
'mask_path'])
num_obj = self.get_obj_index(img)
mask = np.zeros([info[
'height' ], info[ 'width'], num_obj], dtype=np.uint8)
mask = self.draw_mask(num_obj, mask, img,image_id)
occlusion = np.logical_not(mask[:, :, -1]).astype(np.uint8)
fori in range(count - 2, -1, -1):
mask[:, :, i] = mask[:, :, i] * occlusion
occlusion = np.logical_and(occlusion, np.logical_not(mask[:, :, i]))
labels = []
labels = self.from_yaml_get_class(image_id)
labels_form = []
for i in range(len(labels)): if labels[i].find( "tank") != -1:
)
elif labels[i].find(
"triangle")!=-1:
)
elif labels[i].find(
"white")!=-1:
)
class_ids = np.array([self.class_names.index(s)
for s in labels_form]) returnmask, class_ids.astype(np.int32)
def get_ax(rows=1, cols=1, size=8):
"""Return a Matplotlib Axes array to be used inall visualizations in the notebook. Provide a
central point to control graph sizes.
Change the
defaultsize attribute to control the size
of rendered images
""" _, ax = plt.subplots(rows, cols, figsize=(size * cols, size * rows)) returnax
#基礎設定
dataset_root_path=
"train_data/" img_floder = dataset_root_path + "pic" mask_floder = dataset_root_path + "cv2_mask"#yaml_floder = dataset_root_path
imglist = os.listdir(img_floder)
count = len(imglist)
#train與val資料集準備
dataset_train = DrugDataset()
dataset_train.load_shapes(count, img_floder, mask_floder, imglist,dataset_root_path)
dataset_train.prepare()
#print(
"dataset_train-->",dataset_train._image_ids)
dataset_val = DrugDataset()
dataset_val.load_shapes(7, img_floder, mask_floder, imglist,dataset_root_path)
dataset_val.prepare()
#print(
"dataset_val-->",dataset_val._image_ids)
# Load
anddisplay random samples
#image_ids = np.random.choice(dataset_train.image_ids, 4)
#for image_id in image_ids:
# image = dataset_train.load_image(image_id)
# mask, class_ids = dataset_train.load_mask(image_id)
# visualize.display_top_masks(image, mask, class_ids, dataset_train.class_names)
# Create model in training mode
model = modellib.MaskRCNN(mode=
"training", config=config,
model_dir=MODEL_DIR)
# Which weights to start with?
init_with =
"coco" # imagenet, coco, or last if init_with == "imagenet":
model.load_weights(model.get_imagenet_weights(), by_name=True)
elif init_with ==
"coco":
# Load weights trained on MS COCO, but skip layers that
# are different due to the different number of classes
# See README
forinstructions to download the COCO weights
model.load_weights(COCO_MODEL_PATH, by_name=True,
exclude=[
"mrcnn_class_logits" , "mrcnn_bbox_fc" , "mrcnn_bbox" , "mrcnn_mask"])
elif init_with ==
"last":
# Load the last model you trained
and continuetraining
model.load_weights(model.find_last()[1], by_name=True)
# Train the head branches
# Passing layers=
"heads"freezes all layers except the head
# layers. You can also pass a regular expression to select
# which layers to train by name pattern.
model.train(dataset_train, dataset_val,
learning_rate=config.LEARNING_RATE,
epochs=20,
layers=
'heads')
# Fine tune all layers
# Passing layers=
"all"trains all layers. You can also
# pass a regular expression to select which layers to
# train by name pattern.
model.train(dataset_train, dataset_val,
learning_rate=config.LEARNING_RATE / 10,
epochs=40,
layers=
"all" )
關于訓練過程的參數設定,可在config.py檔案中修改,根據自己的要求啦~官方也給出了修改建議:https://github.com/matterport/Mask_RCNN/wiki
可修改的主要有:
BACKBONE = "resnet50" ;這個是遷移學習調用的模型,分為resnet101和resnet50,電腦性能不是特别好的話,建議選擇resnet50,這樣網絡更小,訓練的更快。
model.train(…, layers=‘heads’, …) # Train heads branches (least memory)
model.train(…, layers=‘3+’, …) # Train resnet stage 3
andup
model.train(…, layers=‘4+’, …) # Train resnet stage 4
andup
model.train(…, layers=‘all’, …) # Train all layers (most memory)#這裡是選擇訓練的層數,根據自己的要求選擇
IMAGE_MIN_DIM = 800
IMAGE_MAX_DIM = 1024#設定訓練時的圖像大小,最終以IMAGE_MAX_DIM為準,如果電腦性能不是太好,建議調小
GPU_COUNT = 1
IMAGES_PER_GPU = 2#這個是對GPU的設定,如果顯存不夠,建議把2調成1(雖然batch_size為1并不利于收斂)
TRAIN_ROIS_PER_IMAGE = 200;可根據自己資料集的真實情況來設定
MAX_GT_INSTANCES = 100;設定圖像中最多可檢測出來的物體數量
資料集按照上述格式建立,然後配置好路徑即可訓練,在windows訓練的時候有個問題,就是會出現訓練時一直卡在epoch1,這個問題是因為keras在低版本中不支援多線程(在windows上),推薦keras2.1.6,這個親測可以~
訓練的模型會儲存在logs檔案夾下,.h5格式,訓練好後直接調用即可
測試模型的代碼
# -*- coding: utf-8 -*-
import os
import sys
import random
import math
import numpy as np
import http:// skimage.io
import matplotlib
import matplotlib.pyplot as plt
import cv2
import time
from mrcnn.config import Config
from datetime import datetime
# Root directory of the project
ROOT_DIR = os.getcwd()
# Import Mask RCNN
sys.path.append(ROOT_DIR) # To find local version of the library
from mrcnn import utils
import mrcnn.model as modellib
from mrcnn import visualize
# Import COCO config
sys.path.append(os.path.join(ROOT_DIR,
"samples/coco/")) # To find local version
from samples.coco import coco
# Directory to save logs
andtrained model
MODEL_DIR = os.path.join(ROOT_DIR,
"logs")
# Local path to trained weights file
COCO_MODEL_PATH = os.path.join(MODEL_DIR ,
"mask_rcnn_coco.h5")
# Download COCO trained weights from Releases
if needed if notos.path.exists(COCO_MODEL_PATH):
utils.download_trained_weights(COCO_MODEL_PATH)
print(
"cuiwei***********************")
# Directory of images to run detection on
IMAGE_DIR = os.path.join(ROOT_DIR,
"images" ) class ShapesConfig(Config): """Configuration for training on the toy shapes dataset. Derives from the base Config class andoverrides values specific
to the toy shapes dataset.
"""# Give the configuration a recognizable name
NAME =
"shapes" # Train on 1 GPU and8 images per GPU. We can put multiple images on each
# GPU because the images are small. Batch size is 8 (GPUs * images/GPU).
GPU_COUNT = 1
IMAGES_PER_GPU = 1
# Number of classes (including background)
NUM_CLASSES = 1 + 3 # background + 3 shapes
# Use small images
forfaster training. Set the limits of the small side
# the large side,
andthat determines the image shape.
IMAGE_MIN_DIM = 320
IMAGE_MAX_DIM = 384
# Use smaller anchors because our image
andobjects are small
RPN_ANCHOR_SCALES = (8 * 6, 16 * 6, 32 * 6, 64 * 6, 128 * 6) # anchor side in pixels
# Reduce training ROIs per image because the images are small
andhave
# few objects. Aim to allow ROI sampling to pick 33% positive ROIs.
TRAIN_ROIS_PER_IMAGE =100
# Use a small epoch since the data is simple
STEPS_PER_EPOCH = 100
# use small validation steps since the epoch is small
VALIDATION_STEPS = 50
#import train_tongue
#class InferenceConfig(coco.CocoConfig):
classInferenceConfig(ShapesConfig):
# Set batch size to 1 since we
'll be running inference on# one image at a time. Batch size = GPU_COUNT * IMAGES_PER_GPU
GPU_COUNT = 1
IMAGES_PER_GPU = 1
config = InferenceConfig()
model = modellib.MaskRCNN(mode=
"inference", model_dir=MODEL_DIR, config=config)
# Create model object in inference mode.
model = modellib.MaskRCNN(mode=
"inference", model_dir=MODEL_DIR, config=config)
# Load weights trained on MS-COCO
model.load_weights(COCO_MODEL_PATH, by_name=True)
# COCO Class names
# Index of the
classin the list is its ID. For example, to get ID of
# the teddy bear
class , use: class_names.index( 'teddy bear')
class_names = [
'BG' , 'tank' , 'triangle' , 'white']
# Load a random image from the images folder
file_names = next(os.walk(IMAGE_DIR))[2]
image = skimage.io.imread(os.path.join(IMAGE_DIR, random.choice(file_names)))
a=datetime.now()
# Run detection
results = model.detect([image], verbose=1)
b=datetime.now()
# Visualize results
print(
"shijian",(b-a).seconds)
r = results[0]
visualize.display_instances(image, r[
'rois' ], r[ 'masks' ], r[ 'class_ids'],
class_names, r[
'scores' ])
當然,這裡由于訓練資料太少,效果不是特别好~~~工業上的圖像不是太好擷取。。。
那麼如何把定位坐标和分割像素位置輸出呢?其實都在visualize.py檔案中,就是裡面的display_instances函數。
最後的輸出結果:
其中,mask輸出box區域内的每個像素為true還是false,依次周遊box裡的行和列。
最後,該工程的源代碼位址為:
https://download.csdn.net/download/qq_29462849/10540423,
其中train_test為訓練代碼,test_model為測試代碼,配置好路徑,即可直接運作~~~
跋本文由我們星球特邀嘉賓Oliver Cui編寫,他即将擔任海康的深度學習算法工程師。告訴大家一個好消息,從今天起,他将和我一起在學習圈「3D視覺技術」(下方圖檔)裡與大家一起讨論3D視覺相關技術,同時也為大家做好服務工作。大家如果遇到深度學習相關問題,随時可以向他發起提問,他會及時為大家解答。後期我們也将不定期舉行線下活動,歡迎大家一起參與。
上述内容,如有侵犯版權,請聯系作者,會自行删文。
歡迎加入我們公衆号讀者群一起和同行交流,目前有
3D視覺技術、VSLAM技術、深度學習微信群,請掃描下面微信号加群,備注:”昵稱+學校/公司+研究方向“,例如:”靜靜 + 上海交大 + 3D視覺“。請按照格式備注,否則不予通過。添加成功後會根據研究方向邀請進去相關微信群。
學習3D視覺核心技術,掃描檢視介紹,3天内無條件退款
圈裡有高品質教程資料、可答疑解惑、助你高效解決問題