文章中所有代碼均來自Mask-RCNN_Benchmark,講述其底層實作細節,架構為Pytorch1.0,用于更深入的了解其思想,當然,這相當于是我的閱讀筆記,是以有些地方會講述的不是那麼詳細,如果有疑惑,建議評論區讨論或者自己讀源碼!
https://github.com/facebookresearch/maskrcnn-benchmarkgithub.com
RPN Loss的建構
由于RPN的分類屬于二分類的問題,二分類的CrossEntropy loss就相當于BCE loss,是以這個項目在複現時直接使用了BCE loss,邊框回歸使用的還是Smooth L1 損失!
def
到此,RPN的整個結果以及其比對過程就結束了,接下來需要将訓練好的proposal喂給ROI Head,然後進行分類和回歸!
ROI Head
在原版的Faster RCNN中,其Head結構如下:
經過RPN網絡得到的一系列Region Proposals經過ROI Pooling後得到了固定尺寸,論文中為7x7的特征圖,然後經過全連接配接層,用于分類和回歸。
但在Mask RCNN中,作者對Faster RCNN做了些調整,而Mask RCNN Benchmark複現也是按照後來的Mask RCNN來進行的!網絡結構如下
(預設使用的為左圖):
- 首先使用ROI Align應用到backbone的Conv4的輸出,得到14x14的特征圖(Mask RCNN中為了提高Mask的精度,使用ROI Align來代替ROI Pooling)
- 再經過Conv5得到7x7的特征圖,并進行average pooling,然後直接送入分類和回歸兩個檢測分支,這與原版的Faster RCNN也有差別
代碼如下:
class ResNet50Conv5ROIFeatureExtractor(nn.Module): # 提取特征
def __init__(self, config, in_channels):
super(ResNet50Conv5ROIFeatureExtractor, self).__init__()
resolution = config.MODEL.ROI_BOX_HEAD.POOLER_RESOLUTION
scales = config.MODEL.ROI_BOX_HEAD.POOLER_SCALES
sampling_ratio = config.MODEL.ROI_BOX_HEAD.POOLER_SAMPLING_RATIO
pooler = Pooler(
output_size=(resolution, resolution), # 這裡預設的Box_pooling也是14,與論文不同
scales=scales,
sampling_ratio=sampling_ratio,
)
stage = resnet.StageSpec(index=4, block_count=3, return_features=False)
# 建構ResNet最後一個卷積層 Conv5
head = resnet.ResNetHead(
block_module=config.MODEL.RESNETS.TRANS_FUNC,
stages=(stage,),
num_groups=config.MODEL.RESNETS.NUM_GROUPS,
width_per_group=config.MODEL.RESNETS.WIDTH_PER_GROUP,
stride_in_1x1=config.MODEL.RESNETS.STRIDE_IN_1X1,
stride_init=None,
res2_out_channels=config.MODEL.RESNETS.RES2_OUT_CHANNELS,
dilation=config.MODEL.RESNETS.RES5_DILATION
)
self.pooler = pooler
self.head = head
self.out_channels = head.out_channels
def forward(self, x, proposals):
x = self.pooler(x, proposals)
x = self.head(x)
return x
# Box檢測分支
class FastRCNNPredictor(nn.Module):
def __init__(self, config, in_channels):
super(FastRCNNPredictor, self).__init__()
assert in_channels is not None
num_inputs = in_channels
num_classes = config.MODEL.ROI_BOX_HEAD.NUM_CLASSES
self.avgpool = nn.AdaptiveAvgPool2d(1)
self.cls_score = nn.Linear(num_inputs, num_classes)
num_bbox_reg_classes = 2 if config.MODEL.CLS_AGNOSTIC_BBOX_REG else num_classes
self.bbox_pred = nn.Linear(num_inputs, num_bbox_reg_classes * 4)
nn.init.normal_(self.cls_score.weight, mean=0, std=0.01)
nn.init.constant_(self.cls_score.bias, 0)
nn.init.normal_(self.bbox_pred.weight, mean=0, std=0.001)
nn.init.constant_(self.bbox_pred.bias, 0)
def forward(self, x):
x = self.avgpool(x)
x = x.view(x.size(0), -1)
cls_logit = self.cls_score(x) # 首先進行平均池化,然後送入兩個全連接配接層的分支
bbox_pred = self.bbox_pred(x)
return cls_logit, bbox_pred
至此,ROI Head建構完畢,在項目中,其ROI Align和ROI Pooling操作都使用C++代碼進行編寫,進而加快了處理速度!具體ROI Align和ROI Pooling的差距我們回頭再說!
建構Faster RCNN Loss
當我們得到了對應的預測值和真實值後,我們需要計算兩者之間的loss,分成分類loss和邊框回歸loss, 對于分類loss我們使用的是交叉熵損失,而對于邊框回歸,我們使用的是Smooth L1 loss,其計算形式如下:
def __call__(self, class_logits, box_regression):
"""
Computes the loss for Faster R-CNN.
This requires that the subsample method has been called beforehand.
Arguments:
class_logits (list[Tensor])
box_regression (list[Tensor])
Returns:
classification_loss (Tensor)
box_loss (Tensor)
"""
class_logits = cat(class_logits, dim=0)
box_regression = cat(box_regression, dim=0)
device = class_logits.device
if not hasattr(self, "_proposals"):
raise RuntimeError("subsample needs to be called before")
proposals = self._proposals
labels = cat([proposal.get_field("labels") for proposal in proposals], dim=0)
regression_targets = cat(
[proposal.get_field("regression_targets") for proposal in proposals], dim=0
)
classification_loss = F.cross_entropy(class_logits, labels)
# get indices that correspond to the regression targets for
# the corresponding ground truth labels, to be used with
# advanced indexing
sampled_pos_inds_subset = torch.nonzero(labels > 0).squeeze(1)
labels_pos = labels[sampled_pos_inds_subset]
if self.cls_agnostic_bbox_reg:
map_inds = torch.tensor([4, 5, 6, 7], device=device)
else:
map_inds = 4 * labels_pos[:, None] + torch.tensor(
[0, 1, 2, 3], device=device)
box_loss = smooth_l1_loss(
box_regression[sampled_pos_inds_subset[:, None], map_inds],
regression_targets[sampled_pos_inds_subset],
size_average=False,
beta=1,
)
box_loss = box_loss / labels.numel()
return classification_loss, box_loss