【CenterMask】《CenterMask:Single Shot Instance segmentation with Point Representation》1 Background and Motivation2 Related Work3 Advantages / Contributions4 Method5 Experiments6 Conclusion(own)
【CenterMask】《CenterMask:Single Shot Instance segmentation with Point Representation》1 Background and Motivation2 Related Work3 Advantages / Contributions4 Method5 Experiments6 Conclusion(own)
提出的 Global Saliency Map 模块,能 realize pixel-wise feature alignment naturally
4 Method
【CenterMask】《CenterMask:Single Shot Instance segmentation with Point Representation》1 Background and Motivation2 Related Work3 Advantages / Contributions4 Method5 Experiments6 Conclusion(own)
在 Center 点被预测出来的基础上,
Local Shape representation + Global Saliency Map = Mask
【CenterMask】《CenterMask:Single Shot Instance segmentation with Point Representation》1 Background and Motivation2 Related Work3 Advantages / Contributions4 Method5 Experiments6 Conclusion(own)
【CenterMask】《CenterMask:Single Shot Instance segmentation with Point Representation》1 Background and Motivation2 Related Work3 Advantages / Contributions4 Method5 Experiments6 Conclusion(own)
P P P 是来由 backbone 提取出来的 feature map
F s h a p e ∈ R H × W × S 2 F_{shape} \in \mathbb{R}^{H × W ×S^2} Fshape∈RH×W×S2,Shape head:对于每个像素点 F s h a p e ( x , y ) F_{shape}(x,y) Fshape(x,y)——中心点,其负责预测的实例形状用 1 × 1 × S 2 1×1×S^2 1×1×S2 的向量来表示,然后 reshape 成 S × S S×S S×S 大小,最后根据 F s i z e F_{size} Fsize 预测出的 h h h 和 w w w resize 成 h × w h×w h×w 的形状
F s i z e ∈ R H × W × 2 F_{size} \in \mathbb{R}^{H × W ×2} Fsize∈RH×W×2,Size head:对于每个像素点 F s i z e ( x , y ) F_{size}(x,y) Fsize(x,y),其负责预测的实例大小为 h h h 和 w w w
对应到全局图的话如下所示
【CenterMask】《CenterMask:Single Shot Instance segmentation with Point Representation》1 Background and Motivation2 Related Work3 Advantages / Contributions4 Method5 Experiments6 Conclusion(own)
【CenterMask】《CenterMask:Single Shot Instance segmentation with Point Representation》1 Background and Motivation2 Related Work3 Advantages / Contributions4 Method5 Experiments6 Conclusion(own)
【CenterMask】《CenterMask:Single Shot Instance segmentation with Point Representation》1 Background and Motivation2 Related Work3 Advantages / Contributions4 Method5 Experiments6 Conclusion(own)
achieves pixelwise alignment with the input image.
4.3 Mask Assembly
Local Shape Prediction 模块的输出为 L k ∈ R h × w L_k \in \mathbb{R}^{h×w} Lk∈Rh×w,Global Saliency Generation 模块把目标 crop 出来后的输出为 G k ∈ R h × w G_k \in \mathbb{R}^{h×w} Gk∈Rh×w,两者经过 sigmoid 激活后,按照如下的方式组合在一起,形成最终的 mask
M k = σ ( L k ) ⊙ σ ( G k ) M_k = \sigma(L_k) \odot \sigma(G_k) Mk=σ(Lk)⊙σ(Gk)
Local Shape Prediction 模块和 Global Saliency Generation 模块合体后预测出的 mask 的 Loss 如下
L m a s k = 1 N ∑ k = 1 N B c e ( M k , T k ) L_{mask} = \frac{1}{N}\sum_{k=1}^NBce(M_k,T_k) Lmask=N1k=1∑NBce(Mk,Tk)
其中 T k T_k Tk 是对应的 GT,Bce 是 Binary Cross Entropy 的缩写(参 Binary_Cross_Entropy,logistic regression 的标配)
【CenterMask】《CenterMask:Single Shot Instance segmentation with Point Representation》1 Background and Motivation2 Related Work3 Advantages / Contributions4 Method5 Experiments6 Conclusion(own)
【CenterMask】《CenterMask:Single Shot Instance segmentation with Point Representation》1 Background and Motivation2 Related Work3 Advantages / Contributions4 Method5 Experiments6 Conclusion(own)
4.4 Overall pipeline of CenterMask
【CenterMask】《CenterMask:Single Shot Instance segmentation with Point Representation》1 Background and Motivation2 Related Work3 Advantages / Contributions4 Method5 Experiments6 Conclusion(own)
一共五个 head(天上九头鸟,地上湖北佬,奇怪了,这个九头鸟——怎么才 5 个头,没长大吗)
【CenterMask】《CenterMask:Single Shot Instance segmentation with Point Representation》1 Background and Motivation2 Related Work3 Advantages / Contributions4 Method5 Experiments6 Conclusion(own)
backbone 出来后,第一个 head 就是 Global Saliency Generation 模块,二三 head 就是 Local Shape Prediction 模块
第四个 head 是热力图分支,通道 C C C 表示类别数,用来预测每个实例的中心点和类别!中心点是通过搜索 heatmap 中的每个 window 中的 local maximum 来确定的(8领域中如果响应最高,就为 center point,实现的时候用 3 x 3 max pooling operation 就可以了)。
第五个 head 就是来精修中心点坐标的(recover the discretization error caused by the output stride)
损失函数由如下四个部分组成
1) center point loss
第四个头,预测中心点的损失(同 CenterNet),公式如下,是基于 focal loss 的修改版(a pixel-wise logistic regression modified by the focal loss)
【CenterMask】《CenterMask:Single Shot Instance segmentation with Point Representation》1 Background and Motivation2 Related Work3 Advantages / Contributions4 Method5 Experiments6 Conclusion(own)
其中
Y ^ i j c \hat{Y}_{ijc} Y^ijc 表示是第 c c c 类 heatmap 中,位置 ( i , j ) (i,j) (i,j) 处预测出来的 score
Y i j c Y_{ijc} Yijc 是对应的 GT
N N N 是图片中的中心点个数
α \alpha α、 β \beta β 是超参数
仔细推导,就是把 logistic regression Loss 中的 cross entopy 换成了 focal loss!仅仅多了一个超参数 β \beta β 而已!(y = 1 的时候,在 focal 代入 y 和 y’,y 不等于1的时候,在 focal loss 中代入 1-y 和 1-y’)
公式中 Y i j c Y_{ijc} Yijc 的定义同 Hourglass Network (参考 【Stacked Hourglass】《Stacked Hourglass Networks for Human Pose Estimation》,也即标签采用的是中心点的高斯分布,而不是仅有一个像素 ,Hourglass 网络中采用的是 MSE Loss,这里是作者用的是改进的 Focal Loss)
GT 的高斯分布表达如下
【CenterMask】《CenterMask:Single Shot Instance segmentation with Point Representation》1 Background and Motivation2 Related Work3 Advantages / Contributions4 Method5 Experiments6 Conclusion(own)
Focal Loss 如下所示
【CenterMask】《CenterMask:Single Shot Instance segmentation with Point Representation》1 Background and Motivation2 Related Work3 Advantages / Contributions4 Method5 Experiments6 Conclusion(own)
【CenterMask】《CenterMask:Single Shot Instance segmentation with Point Representation》1 Background and Motivation2 Related Work3 Advantages / Contributions4 Method5 Experiments6 Conclusion(own)
关于 Focal Loss 的解析可以参考 【Focal Loss】《Focal Loss for Dense Object Detection》
2)offset loss
第五个头的损失,同 CenterNet,为 L1 Loss,来 recover the discretization error caused by the output stride
【CenterMask】《CenterMask:Single Shot Instance segmentation with Point Representation》1 Background and Motivation2 Related Work3 Advantages / Contributions4 Method5 Experiments6 Conclusion(own)
其中
O ^ \hat{O} O^ 为预测的 offset
p p p 是 GT
R R R 是 output stride,也就是 heatmap 大小与原图大小的比例关系
特征图的像素点和原图的像素点映射关系为
p ~ = ⌊ p R ⌋ \widetilde{p} = \left \lfloor \frac{p}{R} \right \rfloor p
=⌊Rp⌋
从下面这个图可以看出, H × W H × W H×W(白色部分)和原图大小(Global Saliency Map 应该是放大到了原图大小)还是有差距的(CenterNet 和 Hourglass Network 中比例差距为 4 倍,这里如果同 Hourglass Network 的话,应该也是 4倍的差距)
【CenterMask】《CenterMask:Single Shot Instance segmentation with Point Representation》1 Background and Motivation2 Related Work3 Advantages / Contributions4 Method5 Experiments6 Conclusion(own)
【CenterMask】《CenterMask:Single Shot Instance segmentation with Point Representation》1 Background and Motivation2 Related Work3 Advantages / Contributions4 Method5 Experiments6 Conclusion(own)
其中
S ^ k = ( h ^ , w ^ ) \hat{S}_k = (\hat{h},\hat{w}) S^k=(h^,w^) 表示预测出来的 instance 边界框大小
S k = ( h , w ) {S}_k = (h,w) Sk=(h,w) 是 GT object size
4)mask loss
前面已经介绍过,一二三头的合体 loss
L m a s k = 1 N ∑ k = 1 N B c e ( M k , T k ) L_{mask} = \frac{1}{N}\sum_{k=1}^NBce(M_k,T_k) Lmask=N1k=1∑NBce(Mk,Tk)
其中
M k M_k Mk 是预测出的 mask
T k T_k Tk 是对应的 GT,
Bce 是 Binary Cross Entropy
整体 Loss 表示如下
【CenterMask】《CenterMask:Single Shot Instance segmentation with Point Representation》1 Background and Motivation2 Related Work3 Advantages / Contributions4 Method5 Experiments6 Conclusion(own)
其中 λ p , λ o f f , λ s i z e , λ m a s k \lambda_p,\lambda_{off},\lambda_{size},\lambda_{mask} λp,λoff,λsize,λmask 是对应的系数,实验中分别被设置为了 1,1,0.1,1
5 Experiments
输入大小固定为 512 × 512 512×512 512×512,所有模型 trained from scratch
测试的时候,把热力图中 8 邻域响应最高的点定为中心点,输出 top-100 的 center point,binary 阈值设定为了 0.4
5.1 Datasets
MS COCO instance segmentation
trained on the 115k trainval35k
tested on the 5k minival(消融实验)
Final results are evaluated on 20k test-dev(与 SOTA 比较)
LVIS
5.2 Ablation Study
【CenterMask】《CenterMask:Single Shot Instance segmentation with Point Representation》1 Background and Motivation2 Related Work3 Advantages / Contributions4 Method5 Experiments6 Conclusion(own)
1)Shape size Selection
第二个头
【CenterMask】《CenterMask:Single Shot Instance segmentation with Point Representation》1 Background and Motivation2 Related Work3 Advantages / Contributions4 Method5 Experiments6 Conclusion(own)
S S S 增加到 32 后,没有明显的增长了,采用的是 DLA-34 主干网络(CenterNet 中有用到)!
2)Backbone Architecture
【CenterMask】《CenterMask:Single Shot Instance segmentation with Point Representation》1 Background and Motivation2 Related Work3 Advantages / Contributions4 Method5 Experiments6 Conclusion(own)
Hourglass 大网络精度会更高,但是相应的也更慢
3)Local Shape branch
【CenterMask】《CenterMask:Single Shot Instance segmentation with Point Representation》1 Background and Motivation2 Related Work3 Advantages / Contributions4 Method5 Experiments6 Conclusion(own)
仅有 Local Shape branch 的时候,结果为 26.5,配合 Global Saliency branch 结果为 31.5
应该是去掉了第一个头
【CenterMask】《CenterMask:Single Shot Instance segmentation with Point Representation》1 Background and Motivation2 Related Work3 Advantages / Contributions4 Method5 Experiments6 Conclusion(own)
仅有 Local Shape branch 时,结果展示如下
【CenterMask】《CenterMask:Single Shot Instance segmentation with Point Representation》1 Background and Motivation2 Related Work3 Advantages / Contributions4 Method5 Experiments6 Conclusion(own)
【CenterMask】《CenterMask:Single Shot Instance segmentation with Point Representation》1 Background and Motivation2 Related Work3 Advantages / Contributions4 Method5 Experiments6 Conclusion(own)
结果还是比较粗糙的(边界),但能很清晰的分割出不同的 instance
4)Global Saliency branch
【CenterMask】《CenterMask:Single Shot Instance segmentation with Point Representation》1 Background and Motivation2 Related Work3 Advantages / Contributions4 Method5 Experiments6 Conclusion(own)
仅有 Global Saliency branch 的时候,结果为 21.7,配合 Local Shape branch 结果为 31.5
说明这个 Local Shape branch 模块设计的很到位
仅有 Global Saliency branch 的时候,应该只是去掉了第二个头,而不是二三两个头
【CenterMask】《CenterMask:Single Shot Instance segmentation with Point Representation》1 Background and Motivation2 Related Work3 Advantages / Contributions4 Method5 Experiments6 Conclusion(own)
仅有 Global Saliency branch 的时候,结果如下
【CenterMask】《CenterMask:Single Shot Instance segmentation with Point Representation》1 Background and Motivation2 Related Work3 Advantages / Contributions4 Method5 Experiments6 Conclusion(own)
可以看出,在没有 overlap 的情况下,效果还是挺好的
下表是比较 Global Saliency branch 中 class-agnostic 和 class-specific 的
【CenterMask】《CenterMask:Single Shot Instance segmentation with Point Representation》1 Background and Motivation2 Related Work3 Advantages / Contributions4 Method5 Experiments6 Conclusion(own)
可以看出 class-specific 更有利于 instance segmentation
Global Saliency 分支采用 class-specific 方式以后, a binary cross-entropy loss is added to supervise the branch
【CenterMask】《CenterMask:Single Shot Instance segmentation with Point Representation》1 Background and Motivation2 Related Work3 Advantages / Contributions4 Method5 Experiments6 Conclusion(own)
论文中设计的 Local 模块中与 size 的损失,设计的 Global 模块中没有监督损失,Local 和 Global 的合体有 mask Loss,这里的意思应该是对 class-specific 的 Global 模块,每个 channel(也即每一类)进行空间维度的 binary cross-entropy,相当于在 Global 模块也引入了监督信号!
发现加入这个监督信号后效果更好!
5)Combination of Local Shape and Global Saliency
【CenterMask】《CenterMask:Single Shot Instance segmentation with Point Representation》1 Background and Motivation2 Related Work3 Advantages / Contributions4 Method5 Experiments6 Conclusion(own)
第一列仅有 Local Shape branch,可以看出 separates different instances well,但是 mask 比较粗糙,
第二列仅有 Global Saliency branch,precise segmentation but fails in the overlapping
第三列, 双剑合璧,傲世群雄
5.3 Comparison with state-of-the-art
在 test-dev set 上比较
without pre-trained weights
inference without any NMS
【CenterMask】《CenterMask:Single Shot Instance segmentation with Point Representation》1 Background and Motivation2 Related Work3 Advantages / Contributions4 Method5 Experiments6 Conclusion(own)
作者分析 TensorMask 比较慢的原因是 complicated and time-consuming feature align operations
【CenterMask】《CenterMask:Single Shot Instance segmentation with Point Representation》1 Background and Motivation2 Related Work3 Advantages / Contributions4 Method5 Experiments6 Conclusion(own)
注意 a 列中,Mask R-CNN 的头,作者分析,可能 caused by feature pooling
d 列的 PolarMask 骑的怕是个熊吧,哈哈哈
5.4 CenterMask on FCOS Detector
【CenterMask】《CenterMask:Single Shot Instance segmentation with Point Representation》1 Background and Motivation2 Related Work3 Advantages / Contributions4 Method5 Experiments6 Conclusion(own)
【CenterMask】《CenterMask:Single Shot Instance segmentation with Point Representation》1 Background and Motivation2 Related Work3 Advantages / Contributions4 Method5 Experiments6 Conclusion(own)