论文阅读：SiamMask

2023-03-01 04:23:27

一、对这篇论文的简单理解

1、SiamMask结合两种网络的任务，一个是目标跟踪网络，另一个是目标分割网络，对于vot指标，SiamMask以精度取胜，对于vos指标，SiamMask以速度取胜，以前的一些视频分割网络只能fps基本是1以下，但这个网络可以达到55fps，强！

2、以前的vot大部分是在线学习一个分类器，然后后面的帧可以根据情况更新模板再分类，是tracking-by-detection，比如kcf之类的方法；而Siamese系列的跟踪网络是学习第一帧的模板与搜索区域的相似性–response map，（ROW），把模板的feature map当作是卷积核与搜索区域的feature map进行卷积操作，这里用来depth-wise卷积产生多通道的ROW，可以编码更丰富的信息。

3、一个ROW只预测一个mask，和MaskRcnn不一样，它是预测k个mask，k是类别；还要说明一点，box的分支是每个预测k个box，但这个K是提前设置的不同尺寸不同长宽比的框的数量。

4、如何根据生成的mask产生用于vot指标的框对评测也有影响，论文结合精度和速度选用了MBR

5、训练数据集： COCO [31], ImageNet-VID [47] and YouTube-VOS [58]

6、在vot方面，超越了DaSiamRPN和kcf，decay小，更适合长视频

二、性能比较，论文中给出的数据

1、网络结构图，但实际不是如论文中figure2这么简单的，还有refine模块和adjust层，在附录里有具体展示，这里也给出：：

论文阅读：SiamMask

2、与vot方面的sota工作对比

论文阅读：SiamMask

3、与vos方面的sota工作对比

论文阅读：SiamMask

4、ablation studies

论文阅读：SiamMask

三、对自己有益的原句摘抄

1、

It finds use in a wide range of scenarios

such as automatic surveillance, vehicle navigation, video labelling, human-computer interaction and activity recognition.

这里是指视频跟踪

2、

In this paper, we aim at narrowing the gap between arbitrary object tracking and VOS by proposing SiamMask,

a simple multi-task learning approach that can be used

to address both problems.

3、

To achieve this goal, we simultaneously train a Siamese

network on three tasks, each corresponding to a different

strategy to establish correspondances between the target object and candidate regions in the new frames.

4、

Performance of Correlation Filter-based

trackers has then been notably improved with the adoption of multi-channel formulations [24, 20], spatial constraints [25, 13, 33, 29] and deep features (e.g. [12, 51])

5、这个不太理解，需要继续学习

In order to exploit consistency between video frames,

several methods propagate the supervisory segmentation

mask of the first frame to the temporally adjacent ones via

graph labeling approaches (e.g. [55, 41, 50, 36, 1]). In

particular, Bao et al. [1] recently proposed a very accurate

method that makes use of a spatio-temporal MRF in which

temporal dependencies are modelled by optical flow, while

spatial dependencies are expressed by a CNN

6、

The loss function Lmask (Eq. 3) for the mask prediction task is a binary

logistic regression loss over all RoWs:

7、

In contrast to semantic segmentation methods in the style of FCN [32] and Mask RCNN [17], which maintain explicit spatial information

throughout the network, our approach follows the spirit

of [43, 44] and generates masks starting from a flattened representation of the object.

8、这个也不理解

Similarly to most VOS

methods, in case of multiple objects in the same video

(DAVIS-2017) we simply perform multiple inferences

9、

Interestingly, the refinement approach of Pinheiro et al. [44]

is very important for the contour accuracy FM, but less so

for the other metrics.

论文阅读：SiamMask

一、对这篇论文的简单理解

二、性能比较，论文中给出的数据

继续阅读

3D点云目标跟踪的评价指标及详细代码

归一化相关系数

论文分享（三）——加权采样音频对抗样本攻击一.介绍二.相关工作三.背景四.方法五.实验结果六.总结

Few-Shot Object Detection via Sample Processing

Lattice-BERT 论文阅读Motivation 创新点

CSR-DCF(Discriminative Correlation Filter with Channel and Spatial Reliability) 文章分析（一）

使用MATLAB将.mat文件转换为.txt文件

CVPR2020场景文字数据增强（python实现）

文献阅读--Certified Adversarial Robustness via Randomized Smoothing1 概述2 问题的引出3 Randomized smoothing

新手如何快速入门车辆控制领域？（附带读论文的工具）

Fast Spatio-Temporal Residual Network for Video Super-Resolution阅读理解

论文阅读——Parallel Multi-Resolution Fusion Network for Image Inpainting网络结构损失函数

Glove公式推导

《论文阅读》SAPBERT: Speaker-Aware Pretrained BERT for Emotion Recognition in Conversation

目标检测系相关论文阅读基础网络检测算法框架优化方向

论文阅读笔记（三）：Research on Network Attack Effect Evaluation Based on Confrontational Perspective一. 论文简介二. 创新点和贡献：三. 相关领域的概述(related work)四. 作者的方案五. 主要的信息流（approach）六. 总结