天天看點

fasterrcnn論文_源碼解讀:Faster RCNN的細節(三)

fasterrcnn論文_源碼解讀:Faster RCNN的細節(三)

文章中所有代碼均來自Mask-RCNN_Benchmark,講述其底層實作細節,架構為Pytorch1.0,用于更深入的了解其思想,當然,這相當于是我的閱讀筆記,是以有些地方會講述的不是那麼詳細,如果有疑惑,建議評論區讨論或者自己讀源碼!

https://github.com/facebookresearch/maskrcnn-benchmark​github.com

RPN Loss的建構

由于RPN的分類屬于二分類的問題,二分類的CrossEntropy loss就相當于BCE loss,是以這個項目在複現時直接使用了BCE loss,邊框回歸使用的還是Smooth L1 損失!

def 
           

到此,RPN的整個結果以及其比對過程就結束了,接下來需要将訓練好的proposal喂給ROI Head,然後進行分類和回歸!

ROI Head

在原版的Faster RCNN中,其Head結構如下:

fasterrcnn論文_源碼解讀:Faster RCNN的細節(三)

經過RPN網絡得到的一系列Region Proposals經過ROI Pooling後得到了固定尺寸,論文中為7x7的特征圖,然後經過全連接配接層,用于分類和回歸。

但在Mask RCNN中,作者對Faster RCNN做了些調整,而Mask RCNN Benchmark複現也是按照後來的Mask RCNN來進行的!網絡結構如下

(預設使用的為左圖)

fasterrcnn論文_源碼解讀:Faster RCNN的細節(三)
  • 首先使用ROI Align應用到backbone的Conv4的輸出,得到14x14的特征圖(Mask RCNN中為了提高Mask的精度,使用ROI Align來代替ROI Pooling)
  • 再經過Conv5得到7x7的特征圖,并進行average pooling,然後直接送入分類和回歸兩個檢測分支,這與原版的Faster RCNN也有差別
疑惑:雖然大緻的結構是相同的,但實際複現的代碼與其論文上的結構還是有些出入的!

代碼如下:

class ResNet50Conv5ROIFeatureExtractor(nn.Module):    # 提取特征
    def __init__(self, config, in_channels):
        super(ResNet50Conv5ROIFeatureExtractor, self).__init__()

        resolution = config.MODEL.ROI_BOX_HEAD.POOLER_RESOLUTION
        scales = config.MODEL.ROI_BOX_HEAD.POOLER_SCALES
        sampling_ratio = config.MODEL.ROI_BOX_HEAD.POOLER_SAMPLING_RATIO
        pooler = Pooler(
            output_size=(resolution, resolution),      # 這裡預設的Box_pooling也是14,與論文不同
            scales=scales,
            sampling_ratio=sampling_ratio,
        )

        stage = resnet.StageSpec(index=4, block_count=3, return_features=False)  
        # 建構ResNet最後一個卷積層 Conv5
        head = resnet.ResNetHead(
            block_module=config.MODEL.RESNETS.TRANS_FUNC,
            stages=(stage,),
            num_groups=config.MODEL.RESNETS.NUM_GROUPS,
            width_per_group=config.MODEL.RESNETS.WIDTH_PER_GROUP,
            stride_in_1x1=config.MODEL.RESNETS.STRIDE_IN_1X1,
            stride_init=None,
            res2_out_channels=config.MODEL.RESNETS.RES2_OUT_CHANNELS,
            dilation=config.MODEL.RESNETS.RES5_DILATION
        )

        self.pooler = pooler
        self.head = head
        self.out_channels = head.out_channels

    def forward(self, x, proposals):
        x = self.pooler(x, proposals)     
        x = self.head(x)
        return x

# Box檢測分支
class FastRCNNPredictor(nn.Module):
    def __init__(self, config, in_channels):
        super(FastRCNNPredictor, self).__init__()
        assert in_channels is not None

        num_inputs = in_channels

        num_classes = config.MODEL.ROI_BOX_HEAD.NUM_CLASSES
        self.avgpool = nn.AdaptiveAvgPool2d(1)
        self.cls_score = nn.Linear(num_inputs, num_classes)
        num_bbox_reg_classes = 2 if config.MODEL.CLS_AGNOSTIC_BBOX_REG else num_classes
        self.bbox_pred = nn.Linear(num_inputs, num_bbox_reg_classes * 4)

        nn.init.normal_(self.cls_score.weight, mean=0, std=0.01)
        nn.init.constant_(self.cls_score.bias, 0)

        nn.init.normal_(self.bbox_pred.weight, mean=0, std=0.001)
        nn.init.constant_(self.bbox_pred.bias, 0)

    def forward(self, x):
        x = self.avgpool(x)
        x = x.view(x.size(0), -1)
        cls_logit = self.cls_score(x)   # 首先進行平均池化,然後送入兩個全連接配接層的分支
        bbox_pred = self.bbox_pred(x)
        return cls_logit, bbox_pred
           

至此,ROI Head建構完畢,在項目中,其ROI Align和ROI Pooling操作都使用C++代碼進行編寫,進而加快了處理速度!具體ROI Align和ROI Pooling的差距我們回頭再說!

建構Faster RCNN Loss

當我們得到了對應的預測值和真實值後,我們需要計算兩者之間的loss,分成分類loss和邊框回歸loss, 對于分類loss我們使用的是交叉熵損失,而對于邊框回歸,我們使用的是Smooth L1 loss,其計算形式如下:

fasterrcnn論文_源碼解讀:Faster RCNN的細節(三)
def __call__(self, class_logits, box_regression):
        """
        Computes the loss for Faster R-CNN.
        This requires that the subsample method has been called beforehand.

        Arguments:
            class_logits (list[Tensor])
            box_regression (list[Tensor])

        Returns:
            classification_loss (Tensor)
            box_loss (Tensor)
        """

        class_logits = cat(class_logits, dim=0)
        box_regression = cat(box_regression, dim=0)
        device = class_logits.device

        if not hasattr(self, "_proposals"):
            raise RuntimeError("subsample needs to be called before")

        proposals = self._proposals

        labels = cat([proposal.get_field("labels") for proposal in proposals], dim=0)
        regression_targets = cat(
            [proposal.get_field("regression_targets") for proposal in proposals], dim=0
        )

        classification_loss = F.cross_entropy(class_logits, labels)

        # get indices that correspond to the regression targets for
        # the corresponding ground truth labels, to be used with
        # advanced indexing
        sampled_pos_inds_subset = torch.nonzero(labels > 0).squeeze(1)
        labels_pos = labels[sampled_pos_inds_subset]
        if self.cls_agnostic_bbox_reg:
            map_inds = torch.tensor([4, 5, 6, 7], device=device)
        else:
            map_inds = 4 * labels_pos[:, None] + torch.tensor(
                [0, 1, 2, 3], device=device)

        box_loss = smooth_l1_loss(
            box_regression[sampled_pos_inds_subset[:, None], map_inds],
            regression_targets[sampled_pos_inds_subset],
            size_average=False,
            beta=1,
        )
        box_loss = box_loss / labels.numel()

        return classification_loss, box_loss