天天看點

mmdetection源碼閱讀筆記(1)--建立網絡建立cascade rcnn網絡backboneneckRPN HEADassigners and samplersbbox headmask head小結

之前寫了mmdetection的模型建立部分,這次以cascade rcnn為例具體看下網絡是怎麼建構的。

講網絡之前,要先看看配置檔案,這裡我主要結合官方提供的

cascade_mask_rcnn_r50_fpn_1x.py

來看具體實作,關于這些配置項具體的含義可以看mmdetection的configs中的各項參數具體解釋

建立cascade rcnn網絡

先找到cascade rcnn的定義檔案

mmdet/models/detectors/cascade_rcnn.py

這裡我将cascade rcnn網絡的建立過程主要分為5個部分。

  • backbone
  • neck
  • rpn_head
  • bbox_head
  • mask_head

backbone

cascade rcnn的backb選擇的是

res50

,建立backbone的方式和之前一樣,也是将支援的模型注冊到

registry

中,隻後再通過

builder

進行執行個體化。

resnet

的定義檔案在

mmdet/models/backbones/resnet.py

def forward(self, x):
        x = self.conv1(x)
        x = self.norm1(x)
        x = self.relu(x)
        x = self.maxpool(x)
        outs = []
        for i, layer_name in enumerate(self.res_layers):
            res_layer = getattr(self, layer_name)
            x = res_layer(x)
            if i in self.out_indices:
                outs.append(x)
        if len(outs) == 1:
            return outs[0]
        else:
            return tuple(outs)
           

forward

中outs取的是多stage的輸出,先拼成一個list在轉成tuple,取哪些stage是根據config中的

out_indices

model = dict(
    type='CascadeRCNN',
    num_stages=3,
    pretrained='modelzoo://resnet50',
    backbone=dict(
        type='ResNet',
        depth=50,
        num_stages=4,
        out_indices=(0, 1, 2, 3),
        frozen_stages=1,
        style='pytorch'),
           

backbone是4stage,取了所有的stage。

backbone的主要作用就是提取圖像特征。

neck

這部分主要是實作

FPN

,FPN講解

先看下config檔案中與FPN相關的部分

neck=dict(
        type='FPN',
        in_channels=[256, 512, 1024, 2048],
        out_channels=256,
        num_outs=5),
           

in_channels

與之前

backbone

的輸出相比對,

out_channels

為輸出緯度。

FPN

定義在

mmdet/models/necks/fpn.py

,其中

__init__.py

for i in range(self.start_level, self.backbone_end_level):
            l_conv = ConvModule(
                in_channels[i],
                out_channels,
                1,
                normalize=normalize,
                bias=self.with_bias,
                activation=self.activation,
                inplace=False)
            fpn_conv = ConvModule(
                out_channels,
                out_channels,
                3,
                padding=1,
                normalize=normalize,
                bias=self.with_bias,
                activation=self.activation,
                inplace=False)

            self.lateral_convs.append(l_conv)
            self.fpn_convs.append(fpn_conv)
           

這裡的

self.start_level

為0

self.backbone_end_level

len(in_channels)

,也就是說這裡定義的

lateral_convs

fpn_convs

的長度和輸入的長度是相等的。

這裡可以這樣了解,之前backbone的輸出是多層的特征圖,這裡對每層的輸出用不同的

ConvModule

來處理,再統一

channel

數,就完成了高低層特征的融合。可能比較繞,結合代碼就比較好了解了。

下面是

forward

函數部分代碼。

# build laterals
        laterals = [
            lateral_conv(inputs[i + self.start_level])
            for i, lateral_conv in enumerate(self.lateral_convs)
        ]
# part 1: from original levels
        outs = [
            self.fpn_convs[i](laterals[i]) for i in range(used_backbone_levels)
        ]
           

其實這部分也可以看成是在提取特征,到下面RPN部分就真正涉及到目标檢測了。

RPN HEAD

cascade rcnn

rpn_head

乍一看感覺還挺簡單的,因為這部分主要就兩個網絡。主要涉及到兩個檔案

mmdet/models/anchor_head/anchor_head.py

mmdet/models/anchor_head/rpn_head.py

後者是前者的子類。

先是config相關項

rpn_head=dict(
        type='RPNHead',
        in_channels=256,
        feat_channels=256,
        anchor_scales=[8],
        anchor_ratios=[0.5, 1.0, 2.0],
        anchor_strides=[4, 8, 16, 32, 64],
        target_means=[.0, .0, .0, .0],
        target_stds=[1.0, 1.0, 1.0, 1.0],
        use_sigmoid_cls=True),
           

rpn_head

的主要實作如下

#定義網絡
    def _init_layers(self):
        self.rpn_conv = nn.Conv2d(
            self.in_channels, self.feat_channels, 3, padding=1)
        self.rpn_cls = nn.Conv2d(self.feat_channels,
                                 self.num_anchors * self.cls_out_channels, 1)
        self.rpn_reg = nn.Conv2d(self.feat_channels, self.num_anchors * 4, 1)
    #forward
    def forward_single(self, x):
        x = self.rpn_conv(x)
        x = F.relu(x, inplace=True)
        rpn_cls_score = self.rpn_cls(x)
        rpn_bbox_pred = self.rpn_reg(x)
        return rpn_cls_score, rpn_bbox_pred
           

很簡單,就隻有兩個網絡,判斷是否是前景(rpn_cls),預測框的修改值(rpn_reg)。并且其中

self.num_anchors = len(self.anchor_ratios) * len(self.anchor_scales)

但是RPN的目标是得到候選框,是以這裡就還要用到

anchor_head.py

中的另一個函數

get_bboxs()

def get_bboxes(self, cls_scores, bbox_preds, img_metas, cfg,
                   rescale=False):
        assert len(cls_scores) == len(bbox_preds)
        num_levels = len(cls_scores)

        mlvl_anchors = [
            self.anchor_generators[i].grid_anchors(cls_scores[i].size()[-2:], self.anchor_strides[i])
            for i in range(num_levels)
        ]
        result_list = []
        for img_id in range(len(img_metas)):
            cls_score_list = [
                cls_scores[i][img_id].detach() for i in range(num_levels)
            ]
            bbox_pred_list = [
                bbox_preds[i][img_id].detach() for i in range(num_levels)
            ]
            img_shape = img_metas[img_id]['img_shape']
            scale_factor = img_metas[img_id]['scale_factor']
            proposals = self.get_bboxes_single(cls_score_list, bbox_pred_list,
                                               mlvl_anchors, img_shape,
                                               scale_factor, cfg, rescale)
            result_list.append(proposals)
        return result_list
           

在這裡先通過

self.anchor_generators[i].grid_anchors()

這個函數取到所有的

anchor_boxs

,再通過

self.get_bboxes_single()

根據之前rpn的結果擷取到候選框(proposal boxs)。

self.get_bboxes_single()

中,先在每個尺度上取2000個

anchor

出來,

concat

到一起作為該圖像的anchor,對這些

anchor boxs

nms(thr=0.7)

就得到了所需的候選框。

這部分還有他的

loss

比較複雜,就放到之後寫

loss

的時候在一起寫。

assigners and samplers

上一步

rpn

輸出了一堆候選框,但是在将這些候選框拿去訓練之前還需要分為正負樣本。

assigners

就是完成這個工作的。

cascade_rcnn

預設使用的是

MaxIoUAssigner

定義在

mmdet/core/bbox/assigners/max_iou_assigner.py

主要用到的是

assign()

def assign(self, bboxes, gt_bboxes, gt_bboxes_ignore=None, gt_labels=None):
        """Assign gt to bboxes.

        This method assign a gt bbox to every bbox (proposal/anchor), each bbox
        will be assigned with -1, 0, or a positive number. -1 means don't care,
        0 means negative sample, positive number is the index (1-based) of
        assigned gt.
        The assignment is done in following steps, the order matters.

        1. assign every bbox to -1
        2. assign proposals whose iou with all gts < neg_iou_thr to 0
        3. for each bbox, if the iou with its nearest gt >= pos_iou_thr,
           assign it to that bbox
        4. for each gt bbox, assign its nearest proposals (may be more than
           one) to itself

        Args:
            bboxes (Tensor): Bounding boxes to be assigned, shape(n, 4).
            gt_bboxes (Tensor): Groundtruth boxes, shape (k, 4).
            gt_bboxes_ignore (Tensor, optional): Ground truth bboxes that are
                labelled as `ignored`, e.g., crowd boxes in COCO.
            gt_labels (Tensor, optional): Label of gt_bboxes, shape (k, ).

        Returns:
            :obj:`AssignResult`: The assign result.
        """
        if bboxes.shape[0] == 0 or gt_bboxes.shape[0] == 0:
            raise ValueError('No gt or bboxes')
        bboxes = bboxes[:, :4]
        overlaps = bbox_overlaps(gt_bboxes, bboxes)

        if (self.ignore_iof_thr > 0) and (gt_bboxes_ignore is not None) and (
                gt_bboxes_ignore.numel() > 0):
            if self.ignore_wrt_candidates:
                ignore_overlaps = bbox_overlaps(
                    bboxes, gt_bboxes_ignore, mode='iof')
                ignore_max_overlaps, _ = ignore_overlaps.max(dim=1)
            else:
                ignore_overlaps = bbox_overlaps(
                    gt_bboxes_ignore, bboxes, mode='iof')
                ignore_max_overlaps, _ = ignore_overlaps.max(dim=0)
            overlaps[:, ignore_max_overlaps > self.ignore_iof_thr] = -1

        assign_result = self.assign_wrt_overlaps(overlaps, gt_labels)
        return assign_result
           

proposal

分為正負樣本過後,通過

sampler

對這些

proposal

進行采樣得到

sampler_result

進行訓練。

cascade_rcnn

預設使用的是

RandomSampler

定義在

mmdet/core/bbox/sampler/random_sampler.py

@staticmethod
    def random_choice(gallery, num):
        """Random select some elements from the gallery.

        It seems that Pytorch's implementation is slower than numpy so we use
        numpy to randperm the indices.
        """
        assert len(gallery) >= num
        if isinstance(gallery, list):
            gallery = np.array(gallery)
        cands = np.arange(len(gallery))
        np.random.shuffle(cands)
        rand_inds = cands[:num]
        if not isinstance(gallery, np.ndarray):
            rand_inds = torch.from_numpy(rand_inds).long().to(gallery.device)
        return gallery[rand_inds]

    def _sample_pos(self, assign_result, num_expected, **kwargs):
        """Randomly sample some positive samples."""
        pos_inds = torch.nonzero(assign_result.gt_inds > 0)
        if pos_inds.numel() != 0:
            pos_inds = pos_inds.squeeze(1)
        if pos_inds.numel() <= num_expected:
            return pos_inds
        else:
            return self.random_choice(pos_inds, num_expected)

    def _sample_neg(self, assign_result, num_expected, **kwargs):
        """Randomly sample some negative samples."""
        neg_inds = torch.nonzero(assign_result.gt_inds == 0)
        if neg_inds.numel() != 0:
            neg_inds = neg_inds.squeeze(1)
        if len(neg_inds) <= num_expected:
            return neg_inds
        else:
            return self.random_choice(neg_inds, num_expected)
           

重寫了兩個sample函數供父類調用。

主要用到的是其父類

mmdet/core/bbox/sampler/base_sampler.py

定義的

sample

def sample(self,
               assign_result,
               bboxes,
               gt_bboxes,
               gt_labels=None,
               **kwargs):
        """Sample positive and negative bboxes.

        This is a simple implementation of bbox sampling given candidates,
        assigning results and ground truth bboxes.

        Args:
            assign_result (:obj:`AssignResult`): Bbox assigning results.
            bboxes (Tensor): Boxes to be sampled from.
            gt_bboxes (Tensor): Ground truth bboxes.
            gt_labels (Tensor, optional): Class labels of ground truth bboxes.

        Returns:
            :obj:`SamplingResult`: Sampling result.
        """
        bboxes = bboxes[:, :4]

        gt_flags = bboxes.new_zeros((bboxes.shape[0], ), dtype=torch.uint8)
        if self.add_gt_as_proposals:
            bboxes = torch.cat([gt_bboxes, bboxes], dim=0)
            assign_result.add_gt_(gt_labels)
            gt_ones = bboxes.new_ones(gt_bboxes.shape[0], dtype=torch.uint8)
            gt_flags = torch.cat([gt_ones, gt_flags])

        num_expected_pos = int(self.num * self.pos_fraction)
        pos_inds = self.pos_sampler._sample_pos(
            assign_result, num_expected_pos, bboxes=bboxes, **kwargs)
        # We found that sampled indices have duplicated items occasionally.
        # (may be a bug of PyTorch)
        pos_inds = pos_inds.unique()
        num_sampled_pos = pos_inds.numel()
        num_expected_neg = self.num - num_sampled_pos
        if self.neg_pos_ub >= 0:
            _pos = max(1, num_sampled_pos)
            neg_upper_bound = int(self.neg_pos_ub * _pos)
            if num_expected_neg > neg_upper_bound:
                num_expected_neg = neg_upper_bound
        neg_inds = self.neg_sampler._sample_neg(
            assign_result, num_expected_neg, bboxes=bboxes, **kwargs)
        neg_inds = neg_inds.unique()

        return SamplingResult(pos_inds, neg_inds, bboxes, gt_bboxes,
                              assign_result, gt_flags)
           

現在bbox已經處理好了,之後就是将這些框分别送到

bbox head

mask head

了。

bbox head

當然之前得到的那些框還不能直接送到

bbox head

,在此之前還要做一次

RoI Pooling

,将不同大小的框映射成固定大小。

具體定義在

mmdet/models/roi_extractors/single_level.py

def forward(self, feats, rois):
        if len(feats) == 1:
            return self.roi_layers[0](feats[0], rois)

        out_size = self.roi_layers[0].out_size
        num_levels = len(feats)
        target_lvls = self.map_roi_levels(rois, num_levels)
        roi_feats = torch.cuda.FloatTensor(rois.size()[0], self.out_channels,
                                           out_size, out_size).fill_(0)
        for i in range(num_levels):
            inds = target_lvls == i
            if inds.any():
                rois_ = rois[inds, :]
                roi_feats_t = self.roi_layers[i](feats[i], rois_)
                roi_feats[inds] += roi_feats_t
        return roi_feats
           

這裡的

roi_layers

用的是

RoIAlign

,RoI的結果就可以送到

bbox head

了。

bbox head

部分和之前的

rpn

部分的操作差不多,主要是針對每個框進行分類和坐标修正。之前

rpn

分為前景和背景兩類,這裡分為

N+1

類(實際類别 + 背景)。具體代碼在

mmdet/models/bbox_head/convfc_bbox_head.py

def forward(self, x):
        # shared part
        if self.num_shared_convs > 0:
            for conv in self.shared_convs:
                x = conv(x)

        if self.num_shared_fcs > 0:
            if self.with_avg_pool:
                x = self.avg_pool(x)
            x = x.view(x.size(0), -1)
            for fc in self.shared_fcs:
                x = self.relu(fc(x))
        # separate branches
        x_cls = x
        x_reg = x

        for conv in self.cls_convs:
            x_cls = conv(x_cls)
        if x_cls.dim() > 2:
            if self.with_avg_pool:
                x_cls = self.avg_pool(x_cls)
            x_cls = x_cls.view(x_cls.size(0), -1)
        for fc in self.cls_fcs:
            x_cls = self.relu(fc(x_cls))

        for conv in self.reg_convs:
            x_reg = conv(x_reg)
        if x_reg.dim() > 2:
            if self.with_avg_pool:
                x_reg = self.avg_pool(x_reg)
            x_reg = x_reg.view(x_reg.size(0), -1)
        for fc in self.reg_fcs:
            x_reg = self.relu(fc(x_reg))

        cls_score = self.fc_cls(x_cls) if self.with_cls else None
        bbox_pred = self.fc_reg(x_reg) if self.with_reg else None
        return cls_score, bbox_pred
           

forward

的輸出就是框的分類score和坐标。

之後再通過這兩個結果去計算

bbox_loss

,這個也放到之後在寫。

下面就是與

bbox head

平行的另一個分支

mask head

了。

mask head

mask

部分的流程和

bbox

部分相同,也是先對之前的候選框先做一次

RoI Pooling

,這裡的

RoI

與之前

bbox

網絡都一樣隻是部分參數不同。

具體定義在

mmdet/models/mask_heads/fcn_mask_head.py

def forward(self, x):
        for conv in self.convs:
            x = conv(x)
        if self.upsample is not None:
            x = self.upsample(x)
            if self.upsample_method == 'deconv':
                x = self.relu(x)
        mask_pred = self.conv_logits(x)
        return mask_pred
           

forward

的輸出就是每個像素點的分類值,之後也是通過這個結果去計算

mask loss

bbox head

和這部分

forward

的輸出結果都不是測試階段的最終結果,還需要進行其他操作才能得到測試結果。這部分之後寫

test

的時候再寫。

小結

這篇主要寫了

mmdetection

cascade_rcnn

的網絡建立過程,之前想的是慢慢摳細節,争取把每部分的細節都寫了,但是實際看的時候還是覺得太複雜了,就先把整體流程寫了一遍,相當于把整體骨架寫了。準備之後把

loss

和測試部分寫完了,在慢慢來摳每個部分的細節。

繼續閱讀