之前寫了mmdetection的模型建立部分,這次以cascade rcnn為例具體看下網絡是怎麼建構的。
講網絡之前,要先看看配置檔案,這裡我主要結合官方提供的
cascade_mask_rcnn_r50_fpn_1x.py
來看具體實作,關于這些配置項具體的含義可以看mmdetection的configs中的各項參數具體解釋
建立cascade rcnn網絡
先找到cascade rcnn的定義檔案
mmdet/models/detectors/cascade_rcnn.py
這裡我将cascade rcnn網絡的建立過程主要分為5個部分。
- backbone
- neck
- rpn_head
- bbox_head
- mask_head
backbone
cascade rcnn的backb選擇的是
res50
,建立backbone的方式和之前一樣,也是将支援的模型注冊到
registry
中,隻後再通過
builder
進行執行個體化。
resnet
的定義檔案在
mmdet/models/backbones/resnet.py
def forward(self, x):
x = self.conv1(x)
x = self.norm1(x)
x = self.relu(x)
x = self.maxpool(x)
outs = []
for i, layer_name in enumerate(self.res_layers):
res_layer = getattr(self, layer_name)
x = res_layer(x)
if i in self.out_indices:
outs.append(x)
if len(outs) == 1:
return outs[0]
else:
return tuple(outs)
在
forward
中outs取的是多stage的輸出,先拼成一個list在轉成tuple,取哪些stage是根據config中的
out_indices
。
model = dict(
type='CascadeRCNN',
num_stages=3,
pretrained='modelzoo://resnet50',
backbone=dict(
type='ResNet',
depth=50,
num_stages=4,
out_indices=(0, 1, 2, 3),
frozen_stages=1,
style='pytorch'),
backbone是4stage,取了所有的stage。
backbone的主要作用就是提取圖像特征。
neck
這部分主要是實作
FPN
,FPN講解
先看下config檔案中與FPN相關的部分
neck=dict(
type='FPN',
in_channels=[256, 512, 1024, 2048],
out_channels=256,
num_outs=5),
in_channels
與之前
backbone
的輸出相比對,
out_channels
為輸出緯度。
FPN
定義在
mmdet/models/necks/fpn.py
,其中
__init__.py
中
for i in range(self.start_level, self.backbone_end_level):
l_conv = ConvModule(
in_channels[i],
out_channels,
1,
normalize=normalize,
bias=self.with_bias,
activation=self.activation,
inplace=False)
fpn_conv = ConvModule(
out_channels,
out_channels,
3,
padding=1,
normalize=normalize,
bias=self.with_bias,
activation=self.activation,
inplace=False)
self.lateral_convs.append(l_conv)
self.fpn_convs.append(fpn_conv)
這裡的
self.start_level
為0
self.backbone_end_level
為
len(in_channels)
,也就是說這裡定義的
lateral_convs
和
fpn_convs
的長度和輸入的長度是相等的。
這裡可以這樣了解,之前backbone的輸出是多層的特征圖,這裡對每層的輸出用不同的
ConvModule
來處理,再統一
channel
數,就完成了高低層特征的融合。可能比較繞,結合代碼就比較好了解了。
下面是
forward
函數部分代碼。
# build laterals
laterals = [
lateral_conv(inputs[i + self.start_level])
for i, lateral_conv in enumerate(self.lateral_convs)
]
# part 1: from original levels
outs = [
self.fpn_convs[i](laterals[i]) for i in range(used_backbone_levels)
]
其實這部分也可以看成是在提取特征,到下面RPN部分就真正涉及到目标檢測了。
RPN HEAD
cascade rcnn
的
rpn_head
乍一看感覺還挺簡單的,因為這部分主要就兩個網絡。主要涉及到兩個檔案
mmdet/models/anchor_head/anchor_head.py
和
mmdet/models/anchor_head/rpn_head.py
後者是前者的子類。
先是config相關項
rpn_head=dict(
type='RPNHead',
in_channels=256,
feat_channels=256,
anchor_scales=[8],
anchor_ratios=[0.5, 1.0, 2.0],
anchor_strides=[4, 8, 16, 32, 64],
target_means=[.0, .0, .0, .0],
target_stds=[1.0, 1.0, 1.0, 1.0],
use_sigmoid_cls=True),
rpn_head
的主要實作如下
#定義網絡
def _init_layers(self):
self.rpn_conv = nn.Conv2d(
self.in_channels, self.feat_channels, 3, padding=1)
self.rpn_cls = nn.Conv2d(self.feat_channels,
self.num_anchors * self.cls_out_channels, 1)
self.rpn_reg = nn.Conv2d(self.feat_channels, self.num_anchors * 4, 1)
#forward
def forward_single(self, x):
x = self.rpn_conv(x)
x = F.relu(x, inplace=True)
rpn_cls_score = self.rpn_cls(x)
rpn_bbox_pred = self.rpn_reg(x)
return rpn_cls_score, rpn_bbox_pred
很簡單,就隻有兩個網絡,判斷是否是前景(rpn_cls),預測框的修改值(rpn_reg)。并且其中
self.num_anchors = len(self.anchor_ratios) * len(self.anchor_scales)
。
但是RPN的目标是得到候選框,是以這裡就還要用到
anchor_head.py
中的另一個函數
get_bboxs()
def get_bboxes(self, cls_scores, bbox_preds, img_metas, cfg,
rescale=False):
assert len(cls_scores) == len(bbox_preds)
num_levels = len(cls_scores)
mlvl_anchors = [
self.anchor_generators[i].grid_anchors(cls_scores[i].size()[-2:], self.anchor_strides[i])
for i in range(num_levels)
]
result_list = []
for img_id in range(len(img_metas)):
cls_score_list = [
cls_scores[i][img_id].detach() for i in range(num_levels)
]
bbox_pred_list = [
bbox_preds[i][img_id].detach() for i in range(num_levels)
]
img_shape = img_metas[img_id]['img_shape']
scale_factor = img_metas[img_id]['scale_factor']
proposals = self.get_bboxes_single(cls_score_list, bbox_pred_list,
mlvl_anchors, img_shape,
scale_factor, cfg, rescale)
result_list.append(proposals)
return result_list
在這裡先通過
self.anchor_generators[i].grid_anchors()
這個函數取到所有的
anchor_boxs
,再通過
self.get_bboxes_single()
根據之前rpn的結果擷取到候選框(proposal boxs)。
在
self.get_bboxes_single()
中,先在每個尺度上取2000個
anchor
出來,
concat
到一起作為該圖像的anchor,對這些
anchor boxs
作
nms(thr=0.7)
就得到了所需的候選框。
這部分還有他的
loss
比較複雜,就放到之後寫
loss
的時候在一起寫。
assigners and samplers
上一步
rpn
輸出了一堆候選框,但是在将這些候選框拿去訓練之前還需要分為正負樣本。
assigners
就是完成這個工作的。
cascade_rcnn
預設使用的是
MaxIoUAssigner
定義在
mmdet/core/bbox/assigners/max_iou_assigner.py
主要用到的是
assign()
def assign(self, bboxes, gt_bboxes, gt_bboxes_ignore=None, gt_labels=None):
"""Assign gt to bboxes.
This method assign a gt bbox to every bbox (proposal/anchor), each bbox
will be assigned with -1, 0, or a positive number. -1 means don't care,
0 means negative sample, positive number is the index (1-based) of
assigned gt.
The assignment is done in following steps, the order matters.
1. assign every bbox to -1
2. assign proposals whose iou with all gts < neg_iou_thr to 0
3. for each bbox, if the iou with its nearest gt >= pos_iou_thr,
assign it to that bbox
4. for each gt bbox, assign its nearest proposals (may be more than
one) to itself
Args:
bboxes (Tensor): Bounding boxes to be assigned, shape(n, 4).
gt_bboxes (Tensor): Groundtruth boxes, shape (k, 4).
gt_bboxes_ignore (Tensor, optional): Ground truth bboxes that are
labelled as `ignored`, e.g., crowd boxes in COCO.
gt_labels (Tensor, optional): Label of gt_bboxes, shape (k, ).
Returns:
:obj:`AssignResult`: The assign result.
"""
if bboxes.shape[0] == 0 or gt_bboxes.shape[0] == 0:
raise ValueError('No gt or bboxes')
bboxes = bboxes[:, :4]
overlaps = bbox_overlaps(gt_bboxes, bboxes)
if (self.ignore_iof_thr > 0) and (gt_bboxes_ignore is not None) and (
gt_bboxes_ignore.numel() > 0):
if self.ignore_wrt_candidates:
ignore_overlaps = bbox_overlaps(
bboxes, gt_bboxes_ignore, mode='iof')
ignore_max_overlaps, _ = ignore_overlaps.max(dim=1)
else:
ignore_overlaps = bbox_overlaps(
gt_bboxes_ignore, bboxes, mode='iof')
ignore_max_overlaps, _ = ignore_overlaps.max(dim=0)
overlaps[:, ignore_max_overlaps > self.ignore_iof_thr] = -1
assign_result = self.assign_wrt_overlaps(overlaps, gt_labels)
return assign_result
将
proposal
分為正負樣本過後,通過
sampler
對這些
proposal
進行采樣得到
sampler_result
進行訓練。
cascade_rcnn
預設使用的是
RandomSampler
定義在
mmdet/core/bbox/sampler/random_sampler.py
@staticmethod
def random_choice(gallery, num):
"""Random select some elements from the gallery.
It seems that Pytorch's implementation is slower than numpy so we use
numpy to randperm the indices.
"""
assert len(gallery) >= num
if isinstance(gallery, list):
gallery = np.array(gallery)
cands = np.arange(len(gallery))
np.random.shuffle(cands)
rand_inds = cands[:num]
if not isinstance(gallery, np.ndarray):
rand_inds = torch.from_numpy(rand_inds).long().to(gallery.device)
return gallery[rand_inds]
def _sample_pos(self, assign_result, num_expected, **kwargs):
"""Randomly sample some positive samples."""
pos_inds = torch.nonzero(assign_result.gt_inds > 0)
if pos_inds.numel() != 0:
pos_inds = pos_inds.squeeze(1)
if pos_inds.numel() <= num_expected:
return pos_inds
else:
return self.random_choice(pos_inds, num_expected)
def _sample_neg(self, assign_result, num_expected, **kwargs):
"""Randomly sample some negative samples."""
neg_inds = torch.nonzero(assign_result.gt_inds == 0)
if neg_inds.numel() != 0:
neg_inds = neg_inds.squeeze(1)
if len(neg_inds) <= num_expected:
return neg_inds
else:
return self.random_choice(neg_inds, num_expected)
重寫了兩個sample函數供父類調用。
主要用到的是其父類
mmdet/core/bbox/sampler/base_sampler.py
定義的
sample
def sample(self,
assign_result,
bboxes,
gt_bboxes,
gt_labels=None,
**kwargs):
"""Sample positive and negative bboxes.
This is a simple implementation of bbox sampling given candidates,
assigning results and ground truth bboxes.
Args:
assign_result (:obj:`AssignResult`): Bbox assigning results.
bboxes (Tensor): Boxes to be sampled from.
gt_bboxes (Tensor): Ground truth bboxes.
gt_labels (Tensor, optional): Class labels of ground truth bboxes.
Returns:
:obj:`SamplingResult`: Sampling result.
"""
bboxes = bboxes[:, :4]
gt_flags = bboxes.new_zeros((bboxes.shape[0], ), dtype=torch.uint8)
if self.add_gt_as_proposals:
bboxes = torch.cat([gt_bboxes, bboxes], dim=0)
assign_result.add_gt_(gt_labels)
gt_ones = bboxes.new_ones(gt_bboxes.shape[0], dtype=torch.uint8)
gt_flags = torch.cat([gt_ones, gt_flags])
num_expected_pos = int(self.num * self.pos_fraction)
pos_inds = self.pos_sampler._sample_pos(
assign_result, num_expected_pos, bboxes=bboxes, **kwargs)
# We found that sampled indices have duplicated items occasionally.
# (may be a bug of PyTorch)
pos_inds = pos_inds.unique()
num_sampled_pos = pos_inds.numel()
num_expected_neg = self.num - num_sampled_pos
if self.neg_pos_ub >= 0:
_pos = max(1, num_sampled_pos)
neg_upper_bound = int(self.neg_pos_ub * _pos)
if num_expected_neg > neg_upper_bound:
num_expected_neg = neg_upper_bound
neg_inds = self.neg_sampler._sample_neg(
assign_result, num_expected_neg, bboxes=bboxes, **kwargs)
neg_inds = neg_inds.unique()
return SamplingResult(pos_inds, neg_inds, bboxes, gt_bboxes,
assign_result, gt_flags)
現在bbox已經處理好了,之後就是将這些框分别送到
bbox head
和
mask head
了。
bbox head
當然之前得到的那些框還不能直接送到
bbox head
,在此之前還要做一次
RoI Pooling
,将不同大小的框映射成固定大小。
具體定義在
mmdet/models/roi_extractors/single_level.py
def forward(self, feats, rois):
if len(feats) == 1:
return self.roi_layers[0](feats[0], rois)
out_size = self.roi_layers[0].out_size
num_levels = len(feats)
target_lvls = self.map_roi_levels(rois, num_levels)
roi_feats = torch.cuda.FloatTensor(rois.size()[0], self.out_channels,
out_size, out_size).fill_(0)
for i in range(num_levels):
inds = target_lvls == i
if inds.any():
rois_ = rois[inds, :]
roi_feats_t = self.roi_layers[i](feats[i], rois_)
roi_feats[inds] += roi_feats_t
return roi_feats
這裡的
roi_layers
用的是
RoIAlign
,RoI的結果就可以送到
bbox head
了。
bbox head
部分和之前的
rpn
部分的操作差不多,主要是針對每個框進行分類和坐标修正。之前
rpn
分為前景和背景兩類,這裡分為
N+1
類(實際類别 + 背景)。具體代碼在
mmdet/models/bbox_head/convfc_bbox_head.py
def forward(self, x):
# shared part
if self.num_shared_convs > 0:
for conv in self.shared_convs:
x = conv(x)
if self.num_shared_fcs > 0:
if self.with_avg_pool:
x = self.avg_pool(x)
x = x.view(x.size(0), -1)
for fc in self.shared_fcs:
x = self.relu(fc(x))
# separate branches
x_cls = x
x_reg = x
for conv in self.cls_convs:
x_cls = conv(x_cls)
if x_cls.dim() > 2:
if self.with_avg_pool:
x_cls = self.avg_pool(x_cls)
x_cls = x_cls.view(x_cls.size(0), -1)
for fc in self.cls_fcs:
x_cls = self.relu(fc(x_cls))
for conv in self.reg_convs:
x_reg = conv(x_reg)
if x_reg.dim() > 2:
if self.with_avg_pool:
x_reg = self.avg_pool(x_reg)
x_reg = x_reg.view(x_reg.size(0), -1)
for fc in self.reg_fcs:
x_reg = self.relu(fc(x_reg))
cls_score = self.fc_cls(x_cls) if self.with_cls else None
bbox_pred = self.fc_reg(x_reg) if self.with_reg else None
return cls_score, bbox_pred
forward
的輸出就是框的分類score和坐标。
之後再通過這兩個結果去計算
bbox_loss
,這個也放到之後在寫。
下面就是與
bbox head
平行的另一個分支
mask head
了。
mask head
mask
部分的流程和
bbox
部分相同,也是先對之前的候選框先做一次
RoI Pooling
,這裡的
RoI
與之前
bbox
網絡都一樣隻是部分參數不同。
具體定義在
mmdet/models/mask_heads/fcn_mask_head.py
def forward(self, x):
for conv in self.convs:
x = conv(x)
if self.upsample is not None:
x = self.upsample(x)
if self.upsample_method == 'deconv':
x = self.relu(x)
mask_pred = self.conv_logits(x)
return mask_pred
forward
的輸出就是每個像素點的分類值,之後也是通過這個結果去計算
mask loss
。
在
bbox head
和這部分
forward
的輸出結果都不是測試階段的最終結果,還需要進行其他操作才能得到測試結果。這部分之後寫
test
的時候再寫。
小結
這篇主要寫了
mmdetection
中
cascade_rcnn
的網絡建立過程,之前想的是慢慢摳細節,争取把每部分的細節都寫了,但是實際看的時候還是覺得太複雜了,就先把整體流程寫了一遍,相當于把整體骨架寫了。準備之後把
loss
和測試部分寫完了,在慢慢來摳每個部分的細節。