天天看點

detectron2訓練自己的資料集_mask rcnn訓練自己的資料集

detectron2訓練自己的資料集_mask rcnn訓練自己的資料集

原文首發于微信公衆号「3D視覺工坊」:mask rcnn訓練自己的資料集

前言

最近迷上了mask rcnn,也是由于自己工作需要吧,特意研究了其源代碼,并基于自己的資料進行訓練~

本部落格參考:https://blog.csdn.net/disiwei1012/article/details/79928679#commentsedit

實驗目的

detectron2訓練自己的資料集_mask rcnn訓練自己的資料集
detectron2訓練自己的資料集_mask rcnn訓練自己的資料集
detectron2訓練自己的資料集_mask rcnn訓練自己的資料集

哎~說多了都是淚,誰讓我是工科生呢?隻能檢測工件了。。。做不了高大上的東西了,哈哈

主要參考及工具

基于Mask RCNN開源項目:

https://github.com/matterport/Mask_RCNN

圖檔标記工具基于開源項目:https://github.com/wkentaro/labelme

訓練工具:

win10+GTX1060+cuda9.1+cudnn7+tensorflow-gpu-1.6.0+keras-2.1.6,140幅圖像,一共3類,1小時左右

有關labelme的使用可以參考:

https://blog.csdn.net/shwan_ma/article/details/77823281

有關Mask-RCNN和Faster RCNN算法可以參考:

https://blog.csdn.net/linolzhang/article/details/71774168

https://blog.csdn.net/lk123400/article/details/54343550

準備訓練資料集

這是我建立的四個檔案夾,下面一一道來~

detectron2訓練自己的資料集_mask rcnn訓練自己的資料集
1.pic
detectron2訓練自己的資料集_mask rcnn訓練自己的資料集

這是訓練的圖像,一共700幅

2.json

detectron2訓練自己的資料集_mask rcnn訓練自己的資料集

這是通過labelme處理訓練圖像後生成的檔案

3.labelme_json
detectron2訓練自己的資料集_mask rcnn訓練自己的資料集
detectron2訓練自己的資料集_mask rcnn訓練自己的資料集

這個是處理.json檔案後産生的資料,使用方法為labelme_json_to_dataset+空格+檔案名稱.json,這個前提是labelme要準确安裝并激活。但是這樣會産生一個問題,對多幅圖像這樣處理,太麻煩,在這裡提供一個工具,可以直接在.json檔案目錄下轉換所有的json檔案,連結: https://download.csdn.net/download/qq_29462849/10540381

4.cv2_mask檔案

由于labelme生成的掩碼标簽 label.png為16位存儲,opencv預設讀取8位,需要将16位轉8位,可通過C++程式轉化,代碼請參考這篇博文:http://blog.csdn.net/l297969586/article/details/79154150

detectron2訓練自己的資料集_mask rcnn訓練自己的資料集

一團黑,不過不要怕,正常的~

源代碼

運作該代碼,需要安裝pycocotools,在windows下安裝該工具非常煩,有的可以輕松的安裝成功,有的重裝系統也很難成功,哎,都是坑~~關于Windows下安裝pycocotools請參考:https://blog.csdn.net/chixia1785/article/details/80040172

https://blog.csdn.net/gxiaoyaya/article/details/78363391

測試的源代碼

Github上開源的代碼,是基于ipynb的,我直接把它轉換成.py檔案,首先做個測試,基于coco資料集上訓練好的模型,可以調用攝像頭~~~

import os

import sys

import random

import math

import numpy as np

import http:// skimage.io

import matplotlib

import matplotlib.pyplot as plt

import cv2

import time

# Root directory of the project

ROOT_DIR = os.path.abspath(

"../"

)

# Import Mask RCNN

sys.path.append(ROOT_DIR) # To find local version of the library

from mrcnn import utils

import mrcnn.model as modellib

from mrcnn import visualize

# Import COCO config

sys.path.append(os.path.join(ROOT_DIR,

"samples/coco/"

)) # To find local version

import coco

# Directory to save logs

and

trained model

MODEL_DIR = os.path.join(ROOT_DIR,

"logs"

)

# Local path to trained weights file

COCO_MODEL_PATH = os.path.join(MODEL_DIR ,

"mask_rcnn_coco.h5"

)

# Download COCO trained weights from Releases

if needed if not

os.path.exists(COCO_MODEL_PATH):

utils.download_trained_weights(COCO_MODEL_PATH)

print(

"cuiwei***********************"

)

# Directory of images to run detection on

IMAGE_DIR = os.path.join(ROOT_DIR,

"images" ) class

InferenceConfig(coco.CocoConfig):

# Set batch size to 1 since we

'll be running inference on

# one image at a time. Batch size = GPU_COUNT * IMAGES_PER_GPU

GPU_COUNT = 1

IMAGES_PER_GPU = 1

config = InferenceConfig()

config.display()

# Create model object in inference mode.

model = modellib.MaskRCNN(mode=

"inference"

, model_dir=MODEL_DIR, config=config)

# Load weights trained on MS-COCO

model.load_weights(COCO_MODEL_PATH, by_name=True)

# COCO Class names

# Index of the

class

in the list is its ID. For example, to get ID of

# the teddy bear

class , use: class_names.index( 'teddy bear'

)

class_names = [

'BG' , 'person' , 'bicycle' , 'car' , 'motorcycle' , 'airplane' , 'bus' , 'train' , 'truck' , 'boat' , 'traffic light' , 'fire hydrant' , 'stop sign' , 'parking meter' , 'bench' , 'bird' , 'cat' , 'dog' , 'horse' , 'sheep' , 'cow' , 'elephant' , 'bear' , 'zebra' , 'giraffe' , 'backpack' , 'umbrella' , 'handbag' , 'tie' , 'suitcase' , 'frisbee' , 'skis' , 'snowboard' , 'sports ball' , 'kite' , 'baseball bat' , 'baseball glove' , 'skateboard' , 'surfboard' , 'tennis racket' , 'bottle' , 'wine glass' , 'cup' , 'fork' , 'knife' , 'spoon' , 'bowl' , 'banana' , 'apple' , 'sandwich' , 'orange' , 'broccoli' , 'carrot' , 'hot dog' , 'pizza' , 'donut' , 'cake' , 'chair' , 'couch' , 'potted plant' , 'bed' , 'dining table' , 'toilet' , 'tv' , 'laptop' , 'mouse' , 'remote' , 'keyboard' , 'cell phone' , 'microwave' , 'oven' , 'toaster' , 'sink' , 'refrigerator' , 'book' , 'clock' , 'vase' , 'scissors' , 'teddy bear' , 'hair drier' , 'toothbrush'

]

# Load a random image from the images folder

#file_names = next(os.walk(IMAGE_DIR))[2]

#image = skimage.io.imread(os.path.join(IMAGE_DIR, random.choice(file_names)))

cap = cv2.VideoCapture(0)

while

(1):

# get a frame

ret, frame = cap.read()

# show a frame

start =time.clock()

results = model.detect([frame], verbose=1)

r = results[0]

#cv2.imshow(

"capture"

, frame)

visualize.display_instances(frame, r[

'rois' ], r[ 'masks' ], r[ 'class_ids'

],

class_names, r[

'scores'

])

end = time.clock()

print(end-start)

if cv2.waitKey(1) & 0xFF == ord( 'q' ): break

cap.release()

cv2.destroyAllWindows()

#image= cv2.imread(

"C:Users18301DesktopMask_RCNN-masterimages9.jpg"

)

## Run detection

#

#results = model.detect([image], verbose=1)

#

#print(end-start)

## Visualize results

#r = results[0]

#visualize.display_instances(image, r[

'rois' ], r[ 'masks' ], r[ 'class_ids'

],

# class_names, r[

'scores' ])
detectron2訓練自己的資料集_mask rcnn訓練自己的資料集
detectron2訓練自己的資料集_mask rcnn訓練自己的資料集
detectron2訓練自己的資料集_mask rcnn訓練自己的資料集

關于訓練好的mask rcnn模型,可從此處下載下傳:

https://github.com/matterport/Mask_RCNN/releases,下載下傳好後,配置路徑即可

訓練資料源代碼

# -*- coding: utf-8 -*-

import os

import sys

import random

import math

import re

import time

import numpy as np

import cv2

import matplotlib

import matplotlib.pyplot as plt

import tensorflow as tf

from mrcnn.config import Config

#import utils

from mrcnn import model as modellib,utils

from mrcnn import visualize

import yaml

from mrcnn.model import log

from PIL import Image

#os.environ[

"CUDA_VISIBLE_DEVICES" ] = "0"

# Root directory of the project

ROOT_DIR = os.getcwd()

#ROOT_DIR = os.path.abspath(

"../"

)

# Directory to save logs

and

trained model

MODEL_DIR = os.path.join(ROOT_DIR,

"logs"

)

iter_num=0

# Local path to trained weights file

COCO_MODEL_PATH = os.path.join(ROOT_DIR,

"mask_rcnn_coco.h5"

)

# Download COCO trained weights from Releases

if needed if not

os.path.exists(COCO_MODEL_PATH):

utils.download_trained_weights(COCO_MODEL_PATH)

class ShapesConfig(Config): """Configuration for training on the toy shapes dataset. Derives from the base Config class and

overrides values specific

to the toy shapes dataset.

"""

# Give the configuration a recognizable name

NAME =

"shapes" # Train on 1 GPU and

8 images per GPU. We can put multiple images on each

# GPU because the images are small. Batch size is 8 (GPUs * images/GPU).

GPU_COUNT = 1

IMAGES_PER_GPU = 2

# Number of classes (including background)

NUM_CLASSES = 1 + 3 # background + 3 shapes

# Use small images

for

faster training. Set the limits of the small side

# the large side,

and

that determines the image shape.

IMAGE_MIN_DIM = 320

IMAGE_MAX_DIM = 384

# Use smaller anchors because our image

and

objects are small

RPN_ANCHOR_SCALES = (8 * 6, 16 * 6, 32 * 6, 64 * 6, 128 * 6) # anchor side in pixels

# Reduce training ROIs per image because the images are small

and

have

# few objects. Aim to allow ROI sampling to pick 33% positive ROIs.

TRAIN_ROIS_PER_IMAGE = 100

# Use a small epoch since the data is simple

STEPS_PER_EPOCH = 100

# use small validation steps since the epoch is small

VALIDATION_STEPS = 50

config = ShapesConfig()

config.display()

class

DrugDataset(utils.Dataset):

# 得到該圖中有多少個執行個體(物體)

def get_obj_index(self, image):

n = np.max(image)

return

n

# 解析labelme中得到的yaml檔案,進而得到mask每一層對應的執行個體标簽

def from_yaml_get_class(self, image_id):

info = self.image_info[image_id]

with open(info[

'yaml_path'

]) as f:

temp = yaml.load(f.read())

labels = temp[

'label_names'

]

del labels[0]

return

labels

# 重新寫draw_mask

def draw_mask(self, num_obj, mask, image,image_id):

#print(

"draw_mask-->"

,image_id)

#print(

"self.image_info"

,self.image_info)

info = self.image_info[image_id]

#print(

"info-->"

,info)

#print(

"info[width]----->" ,info[ 'width' ], "-info[height]--->" ,info[ 'height' ]) for index in range(num_obj): for i in range(info[ 'width' ]): for j in range(info[ 'height'

]):

#print(

"image_id-->" ,image_id, "-i--->" ,i, "-j--->"

,j)

#print(

"info[width]----->" ,info[ 'width' ], "-info[height]--->" ,info[ 'height'

])

at_pixel = image.getpixel((i, j))

if

at_pixel == index + 1:

mask[j, i, index] = 1

return

mask

# 重新寫load_shapes,裡面包含自己的類别,可以任意添加

# 并在self.image_info資訊中添加了path、mask_path 、yaml_path

# yaml_pathdataset_root_path =

"/tongue_dateset/" # img_floder = dataset_root_path + "rgb" # mask_floder = dataset_root_path + "mask" # dataset_root_path = "/tongue_dateset/" def load_shapes(self, count, img_floder, mask_floder, imglist, dataset_root_path): """Generate the requested number of synthetic images.

count: number of images to generate.

height, width: the size of the generated images.

"""

# Add classes,可通過這種方式擴充多個物體

self.add_class(

"shapes" , 1, "tank"

) # 黑色素瘤

self.add_class(

"shapes" , 2, "triangle"

)

self.add_class(

"shapes" , 3, "white" ) for

i in range(count):

# 擷取圖檔寬和高

filestr = imglist[i].split(

"."

)[0]

#print(imglist[i],

"-->" ,cv_img.shape[1], "--->"

,cv_img.shape[0])

#print(

"id-->" , i, " imglist[" , i, "]-->" , imglist[i], "filestr-->"

,filestr)

#filestr = filestr.split(

"_"

)[1]

mask_path = mask_floder +

"/" + filestr + ".png" yaml_path = dataset_root_path + "labelme_json/" + filestr + "_json/info.yaml" print(dataset_root_path + "labelme_json/" + filestr + "_json/img.png"

)

cv_img = cv2.imread(dataset_root_path +

"labelme_json/" + filestr + "_json/img.png"

)

self.add_image(

"shapes" , image_id=i, path=img_floder + "/"

+ imglist[i],

width=cv_img.shape[1], height=cv_img.shape[0], mask_path=mask_path, yaml_path=yaml_path)

# 重寫load_mask

def load_mask(self, image_id):

"""Generate instance masks for shapes of the given image ID. """

global iter_num

print(

"image_id"

,image_id)

info = self.image_info[image_id]

count = 1 # number of object

img = Image.open(info[

'mask_path'

])

num_obj = self.get_obj_index(img)

mask = np.zeros([info[

'height' ], info[ 'width'

], num_obj], dtype=np.uint8)

mask = self.draw_mask(num_obj, mask, img,image_id)

occlusion = np.logical_not(mask[:, :, -1]).astype(np.uint8)

for

i in range(count - 2, -1, -1):

mask[:, :, i] = mask[:, :, i] * occlusion

occlusion = np.logical_and(occlusion, np.logical_not(mask[:, :, i]))

labels = []

labels = self.from_yaml_get_class(image_id)

labels_form = []

for i in range(len(labels)): if labels[i].find( "tank"

) != -1:

# print

"box" labels_form.append( "tank"

)

elif labels[i].find(

"triangle"

)!=-1:

#print

"column" labels_form.append( "triangle"

)

elif labels[i].find(

"white"

)!=-1:

#print

"package" labels_form.append( "white"

)

class_ids = np.array([self.class_names.index(s)

for s in labels_form]) return

mask, class_ids.astype(np.int32)

def get_ax(rows=1, cols=1, size=8):

"""Return a Matplotlib Axes array to be used in

all visualizations in the notebook. Provide a

central point to control graph sizes.

Change the

default

size attribute to control the size

of rendered images

""" _, ax = plt.subplots(rows, cols, figsize=(size * cols, size * rows)) return

ax

#基礎設定

dataset_root_path=

"train_data/" img_floder = dataset_root_path + "pic" mask_floder = dataset_root_path + "cv2_mask"

#yaml_floder = dataset_root_path

imglist = os.listdir(img_floder)

count = len(imglist)

#train與val資料集準備

dataset_train = DrugDataset()

dataset_train.load_shapes(count, img_floder, mask_floder, imglist,dataset_root_path)

dataset_train.prepare()

#print(

"dataset_train-->"

,dataset_train._image_ids)

dataset_val = DrugDataset()

dataset_val.load_shapes(7, img_floder, mask_floder, imglist,dataset_root_path)

dataset_val.prepare()

#print(

"dataset_val-->"

,dataset_val._image_ids)

# Load

and

display random samples

#image_ids = np.random.choice(dataset_train.image_ids, 4)

#for image_id in image_ids:

# image = dataset_train.load_image(image_id)

# mask, class_ids = dataset_train.load_mask(image_id)

# visualize.display_top_masks(image, mask, class_ids, dataset_train.class_names)

# Create model in training mode

model = modellib.MaskRCNN(mode=

"training"

, config=config,

model_dir=MODEL_DIR)

# Which weights to start with?

init_with =

"coco" # imagenet, coco, or last if init_with == "imagenet"

:

model.load_weights(model.get_imagenet_weights(), by_name=True)

elif init_with ==

"coco"

:

# Load weights trained on MS COCO, but skip layers that

# are different due to the different number of classes

# See README

for

instructions to download the COCO weights

model.load_weights(COCO_MODEL_PATH, by_name=True,

exclude=[

"mrcnn_class_logits" , "mrcnn_bbox_fc" , "mrcnn_bbox" , "mrcnn_mask"

])

elif init_with ==

"last"

:

# Load the last model you trained

and continue

training

model.load_weights(model.find_last()[1], by_name=True)

# Train the head branches

# Passing layers=

"heads"

freezes all layers except the head

# layers. You can also pass a regular expression to select

# which layers to train by name pattern.

model.train(dataset_train, dataset_val,

learning_rate=config.LEARNING_RATE,

epochs=20,

layers=

'heads'

)

# Fine tune all layers

# Passing layers=

"all"

trains all layers. You can also

# pass a regular expression to select which layers to

# train by name pattern.

model.train(dataset_train, dataset_val,

learning_rate=config.LEARNING_RATE / 10,

epochs=40,

layers=

"all" )
detectron2訓練自己的資料集_mask rcnn訓練自己的資料集

關于訓練過程的參數設定,可在config.py檔案中修改,根據自己的要求啦~官方也給出了修改建議:https://github.com/matterport/Mask_RCNN/wiki

可修改的主要有:

BACKBONE = "resnet50" ;這個是遷移學習調用的模型,分為resnet101和resnet50,電腦性能不是特别好的話,建議選擇resnet50,這樣網絡更小,訓練的更快。

model.train(…, layers=‘heads’, …) # Train heads branches (least memory)

model.train(…, layers=‘3+’, …) # Train resnet stage 3

and

up

model.train(…, layers=‘4+’, …) # Train resnet stage 4

and

up

model.train(…, layers=‘all’, …) # Train all layers (most memory)#這裡是選擇訓練的層數,根據自己的要求選擇

IMAGE_MIN_DIM = 800

IMAGE_MAX_DIM = 1024#設定訓練時的圖像大小,最終以IMAGE_MAX_DIM為準,如果電腦性能不是太好,建議調小

GPU_COUNT = 1

IMAGES_PER_GPU = 2#這個是對GPU的設定,如果顯存不夠,建議把2調成1(雖然batch_size為1并不利于收斂)

TRAIN_ROIS_PER_IMAGE = 200;可根據自己資料集的真實情況來設定

MAX_GT_INSTANCES = 100;設定圖像中最多可檢測出來的物體數量

資料集按照上述格式建立,然後配置好路徑即可訓練,在windows訓練的時候有個問題,就是會出現訓練時一直卡在epoch1,這個問題是因為keras在低版本中不支援多線程(在windows上),推薦keras2.1.6,這個親測可以~

訓練的模型會儲存在logs檔案夾下,.h5格式,訓練好後直接調用即可

測試模型的代碼

# -*- coding: utf-8 -*-

import os

import sys

import random

import math

import numpy as np

import http:// skimage.io

import matplotlib

import matplotlib.pyplot as plt

import cv2

import time

from mrcnn.config import Config

from datetime import datetime

# Root directory of the project

ROOT_DIR = os.getcwd()

# Import Mask RCNN

sys.path.append(ROOT_DIR) # To find local version of the library

from mrcnn import utils

import mrcnn.model as modellib

from mrcnn import visualize

# Import COCO config

sys.path.append(os.path.join(ROOT_DIR,

"samples/coco/"

)) # To find local version

from samples.coco import coco

# Directory to save logs

and

trained model

MODEL_DIR = os.path.join(ROOT_DIR,

"logs"

)

# Local path to trained weights file

COCO_MODEL_PATH = os.path.join(MODEL_DIR ,

"mask_rcnn_coco.h5"

)

# Download COCO trained weights from Releases

if needed if not

os.path.exists(COCO_MODEL_PATH):

utils.download_trained_weights(COCO_MODEL_PATH)

print(

"cuiwei***********************"

)

# Directory of images to run detection on

IMAGE_DIR = os.path.join(ROOT_DIR,

"images" ) class ShapesConfig(Config): """Configuration for training on the toy shapes dataset. Derives from the base Config class and

overrides values specific

to the toy shapes dataset.

"""

# Give the configuration a recognizable name

NAME =

"shapes" # Train on 1 GPU and

8 images per GPU. We can put multiple images on each

# GPU because the images are small. Batch size is 8 (GPUs * images/GPU).

GPU_COUNT = 1

IMAGES_PER_GPU = 1

# Number of classes (including background)

NUM_CLASSES = 1 + 3 # background + 3 shapes

# Use small images

for

faster training. Set the limits of the small side

# the large side,

and

that determines the image shape.

IMAGE_MIN_DIM = 320

IMAGE_MAX_DIM = 384

# Use smaller anchors because our image

and

objects are small

RPN_ANCHOR_SCALES = (8 * 6, 16 * 6, 32 * 6, 64 * 6, 128 * 6) # anchor side in pixels

# Reduce training ROIs per image because the images are small

and

have

# few objects. Aim to allow ROI sampling to pick 33% positive ROIs.

TRAIN_ROIS_PER_IMAGE =100

# Use a small epoch since the data is simple

STEPS_PER_EPOCH = 100

# use small validation steps since the epoch is small

VALIDATION_STEPS = 50

#import train_tongue

#class InferenceConfig(coco.CocoConfig):

class

InferenceConfig(ShapesConfig):

# Set batch size to 1 since we

'll be running inference on

# one image at a time. Batch size = GPU_COUNT * IMAGES_PER_GPU

GPU_COUNT = 1

IMAGES_PER_GPU = 1

config = InferenceConfig()

model = modellib.MaskRCNN(mode=

"inference"

, model_dir=MODEL_DIR, config=config)

# Create model object in inference mode.

model = modellib.MaskRCNN(mode=

"inference"

, model_dir=MODEL_DIR, config=config)

# Load weights trained on MS-COCO

model.load_weights(COCO_MODEL_PATH, by_name=True)

# COCO Class names

# Index of the

class

in the list is its ID. For example, to get ID of

# the teddy bear

class , use: class_names.index( 'teddy bear'

)

class_names = [

'BG' , 'tank' , 'triangle' , 'white'

]

# Load a random image from the images folder

file_names = next(os.walk(IMAGE_DIR))[2]

image = skimage.io.imread(os.path.join(IMAGE_DIR, random.choice(file_names)))

a=datetime.now()

# Run detection

results = model.detect([image], verbose=1)

b=datetime.now()

# Visualize results

print(

"shijian"

,(b-a).seconds)

r = results[0]

visualize.display_instances(image, r[

'rois' ], r[ 'masks' ], r[ 'class_ids'

],

class_names, r[

'scores' ])
detectron2訓練自己的資料集_mask rcnn訓練自己的資料集
detectron2訓練自己的資料集_mask rcnn訓練自己的資料集

當然,這裡由于訓練資料太少,效果不是特别好~~~工業上的圖像不是太好擷取。。。

那麼如何把定位坐标和分割像素位置輸出呢?其實都在visualize.py檔案中,就是裡面的display_instances函數。

detectron2訓練自己的資料集_mask rcnn訓練自己的資料集

最後的輸出結果:

detectron2訓練自己的資料集_mask rcnn訓練自己的資料集

其中,mask輸出box區域内的每個像素為true還是false,依次周遊box裡的行和列。

最後,該工程的源代碼位址為:

https://download.csdn.net/download/qq_29462849/10540423,

其中train_test為訓練代碼,test_model為測試代碼,配置好路徑,即可直接運作~~~

本文由我們星球特邀嘉賓Oliver Cui編寫,他即将擔任海康的深度學習算法工程師。告訴大家一個好消息,從今天起,他将和我一起在學習圈「3D視覺技術」(下方圖檔)裡與大家一起讨論3D視覺相關技術,同時也為大家做好服務工作。大家如果遇到深度學習相關問題,随時可以向他發起提問,他會及時為大家解答。後期我們也将不定期舉行線下活動,歡迎大家一起參與。

上述内容,如有侵犯版權,請聯系作者,會自行删文。

歡迎加入我們公衆号讀者群一起和同行交流,目前有

3D視覺技術、VSLAM技術、深度學習

微信群,請掃描下面微信号加群,備注:”昵稱+學校/公司+研究方向“,例如:”靜靜 + 上海交大 + 3D視覺“。請按照格式備注,否則不予通過。添加成功後會根據研究方向邀請進去相關微信群。

detectron2訓練自己的資料集_mask rcnn訓練自己的資料集

學習3D視覺核心技術,掃描檢視介紹,3天内無條件退款

detectron2訓練自己的資料集_mask rcnn訓練自己的資料集

圈裡有高品質教程資料、可答疑解惑、助你高效解決問題

繼續閱讀