論文閱讀：SiamMask

2023-03-01 04:23:27

一、對這篇論文的簡單了解

1、SiamMask結合兩種網絡的任務，一個是目标跟蹤網絡，另一個是目标分割網絡，對于vot名額，SiamMask以精度取勝，對于vos名額，SiamMask以速度取勝，以前的一些視訊分割網絡隻能fps基本是1以下，但這個網絡可以達到55fps，強！

2、以前的vot大部分是線上學習一個分類器，然後後面的幀可以根據情況更新模闆再分類，是tracking-by-detection，比如kcf之類的方法；而Siamese系列的跟蹤網絡是學習第一幀的模闆與搜尋區域的相似性–response map，（ROW），把模闆的feature map當作是卷積核與搜尋區域的feature map進行卷積操作，這裡用來depth-wise卷積産生多通道的ROW，可以編碼更豐富的資訊。

3、一個ROW隻預測一個mask，和MaskRcnn不一樣，它是預測k個mask，k是類别；還要說明一點，box的分支是每個預測k個box，但這個K是提前設定的不同尺寸不同長寬比的框的數量。

4、如何根據生成的mask産生用于vot名額的框對評測也有影響，論文結合精度和速度選用了MBR

5、訓練資料集： COCO [31], ImageNet-VID [47] and YouTube-VOS [58]

6、在vot方面，超越了DaSiamRPN和kcf，decay小，更适合長視訊

二、性能比較，論文中給出的資料

1、網絡結構圖，但實際不是如論文中figure2這麼簡單的，還有refine子產品和adjust層，在附錄裡有具體展示，這裡也給出：：

論文閱讀：SiamMask

2、與vot方面的sota工作對比

論文閱讀：SiamMask

3、與vos方面的sota工作對比

論文閱讀：SiamMask

4、ablation studies

論文閱讀：SiamMask

三、對自己有益的原句摘抄

1、

It finds use in a wide range of scenarios

such as automatic surveillance, vehicle navigation, video labelling, human-computer interaction and activity recognition.

這裡是指視訊跟蹤

2、

In this paper, we aim at narrowing the gap between arbitrary object tracking and VOS by proposing SiamMask,

a simple multi-task learning approach that can be used

to address both problems.

3、

To achieve this goal, we simultaneously train a Siamese

network on three tasks, each corresponding to a different

strategy to establish correspondances between the target object and candidate regions in the new frames.

4、

Performance of Correlation Filter-based

trackers has then been notably improved with the adoption of multi-channel formulations [24, 20], spatial constraints [25, 13, 33, 29] and deep features (e.g. [12, 51])

5、這個不太了解，需要繼續學習

In order to exploit consistency between video frames,

several methods propagate the supervisory segmentation

mask of the first frame to the temporally adjacent ones via

graph labeling approaches (e.g. [55, 41, 50, 36, 1]). In

particular, Bao et al. [1] recently proposed a very accurate

method that makes use of a spatio-temporal MRF in which

temporal dependencies are modelled by optical flow, while

spatial dependencies are expressed by a CNN

6、

The loss function Lmask (Eq. 3) for the mask prediction task is a binary

logistic regression loss over all RoWs:

7、

In contrast to semantic segmentation methods in the style of FCN [32] and Mask RCNN [17], which maintain explicit spatial information

throughout the network, our approach follows the spirit

of [43, 44] and generates masks starting from a flattened representation of the object.

8、這個也不了解

Similarly to most VOS

methods, in case of multiple objects in the same video

(DAVIS-2017) we simply perform multiple inferences

9、

Interestingly, the refinement approach of Pinheiro et al. [44]

is very important for the contour accuracy FM, but less so

for the other metrics.

論文閱讀：SiamMask

一、對這篇論文的簡單了解

二、性能比較，論文中給出的資料

繼續閱讀

3D點雲目标跟蹤的評價名額及詳細代碼

歸一化相關系數

論文分享（三）——權重采樣音頻對抗樣本攻擊一.介紹二.相關工作三.背景四.方法五.實驗結果六.總結

Few-Shot Object Detection via Sample Processing

Lattice-BERT 論文閱讀Motivation 創新點

CSR-DCF(Discriminative Correlation Filter with Channel and Spatial Reliability) 文章分析（一）

使用MATLAB将.mat檔案轉換為.txt檔案

CVPR2020場景文字資料增強（python實作）

文獻閱讀--Certified Adversarial Robustness via Randomized Smoothing1 概述2 問題的引出3 Randomized smoothing

新手如何快速入門車輛控制領域？（附帶讀論文的工具）

Fast Spatio-Temporal Residual Network for Video Super-Resolution閱讀了解

論文閱讀——Parallel Multi-Resolution Fusion Network for Image Inpainting網絡結構損失函數

Glove公式推導

《論文閱讀》SAPBERT: Speaker-Aware Pretrained BERT for Emotion Recognition in Conversation

目标檢測系相關論文閱讀基礎網絡檢測算法架構優化方向

論文閱讀筆記（三）：Research on Network Attack Effect Evaluation Based on Confrontational Perspective一. 論文簡介二. 創新點和貢獻：三. 相關領域的概述(related work)四. 作者的方案五. 主要的資訊流（approach）六. 總結