2016 CVPR-Person Re-Identification by Multi-Channel Parts-Based CNN with Improved Triplet Loss

论文地址

Motivation

如今的Re-ID工作大多是将特征学习与度量学习分开，那么能不能把两者一起学习来提高性能呢？
同时对整张图像直接提取特征往往不能捕捉图像中人物的细节信息，怎么能设计模型来更好利用局部特征呢？
之前的triplet loss仅仅使类内距离小于类间距离就行，这样学习出来的类簇相对较大，能不能学到更紧凑的类簇来提高判别力呢？
本文的multi-channel + improved triplet loss

Contribution

提出了multi-channel CNN model来同时学习身体全局特征以及局部细节特征，最后将两者结合作为输入行人图像的表示
an improved triplet loss function：不仅要求类内距离小于类间距离，同时还要小于一个预先定义的margin，通过改进的loss能够进一步提高模型的精度

1.Introduction

Re-ID定义：在跨摄像头或跨时间识别行人
应用：
- 视频监控
- 人机交互
- 机器人
- 视频内容检索
挑战：
- 不同摄像头视角下视觉外观以及周围环境的距离变化
- 行人姿势在时间与空间上的巨大变化
- 背景混杂以及遮挡
- 不同行人可能会有相似的外观
  
  2016 CVPR-Person Re-Identification by Multi-Channel Parts-Based CNN with Improved Triplet Loss
本文动机与贡献

2.Related Work

re-ID的工作两大方面：
- 特征提取：
  - color histograms and their variants
  - local binary pattern
  - Gabor features
  - color name
  - other visual appearance or contextual cues
- 距离度量：
  - Mahalanobis metric learning(KISSME)
  - Local Fisher Discriminant Analysis(LFDA)
  - Marginal Fisher Analysis(MFA)
  - large margin nearest neighbour(LMNN)
  - Locally Adaptive Decision Functions(LADF)
  - attribute consistent matching
深度学习方法：
- 有关Triplet loss应用的：
  - fine grained image similarity metrics
  - FaceNet
  - Deep feature learning with relative distance comparison for person re- identification
- 其他工作：
  - FPNN
  - DeepM
  - mFilter：local path matching method
  - 2015 CVPR An Improved Deep Learning Architecture for Person Re-Identification
本文工作与上述工作的不同：
- 网络结构：使用了由多个分支构成大单一网络来学习全局与局部特征
- loss function：improved triplet loss使类内更近、类间更远

3.The Proposed Person Re-Id Method

3.1. The Overall Framework

如下图是一个triplet training，三个部分共享参数，每个部分都为本文提出的multi-channel CNN model

2016 CVPR-Person Re-Identification by Multi-Channel Parts-Based CNN with Improved Triplet Loss
该模型将输入 Ii=<Ioi,I+i,I−i> I i =< I i o , I i + , I i − > 映射到 ϕw(Ii)=<ϕw(Ioi),ϕw(I+i),ϕw(I−i)> ϕ w ( I i ) =< ϕ w ( I i o ) , ϕ w ( I i + ) , ϕ w ( I i − ) > ，其中 ϕw(Ioi) ϕ w ( I i o ) 与 ϕw(I+i) ϕ w ( I i + ) 距离小于一个margin，其中 ϕw(Ioi) ϕ w ( I i o ) 与 ϕw(I−i) ϕ w ( I i − ) 距离较远

3.2. Multi-Channel Parts-based CNN Model

主要层：如下图
- 一个全局卷积层:7x7filter size，stride=3
- 全身卷积层
- 4个身体分块卷积层
- 5个各自通道的全连接层
  
  2016 CVPR-Person Re-Identification by Multi-Channel Parts-Based CNN with Improved Triplet Loss
全身通道：conv:5x5 s=1 –> max-pooling:3x3 –> conv:3x3 s=1 –>max-pooling:3x3 –> fc:400
4个身体分块通道：conv:3x3 s=1 –> conv:3x3 s=1 –>fc 100
对于较大的数据集CUHK01，对五个通道各加了一个卷积层，本文使用了两种网络配置
在最后对各通道输出的向量进行了拼接，特征向量同时有全局与局部的特征，能带来显著的性能提升。

3.3. Improved Triplet Loss Function

original triplet: loss:只是要求类内小于内间距离，类簇可能相对较大，这样会影响re-id的性能

dn(Ioi,I+i,I−i,w)=d(ϕw(Ioi),ϕw(I+i))−d(ϕw(Ioi),ϕw(I−i))⩽τ1. d n ( I i o , I i + , I i − , w ) = d ( ϕ w ( I i o ) , ϕ w ( I i + ) ) − d ( ϕ w ( I i o ) , ϕ w ( I i − ) ) ⩽ τ 1 .
improved triplet: loss:要求类内也要小于一个margin

dp(Ioi,I+i,w)=d(ϕw(Ioi),ϕw(I+i))⩽τ2 d p ( I i o , I i + , w ) = d ( ϕ w ( I i o ) , ϕ w ( I i + ) ) ⩽ τ 2
最终公式如下：

L(I,w)=1N∑i=1N(max{dn(Ioi,I+i,I−i,w),τ1}inter−class−constraint+βmax{dp(Ioi,I+i,I−i,w),τ2}intra−class−constraint)d(ϕw(Ioi),ϕw(I+i))=‖ϕw(Ioi)−ϕw(I+i)‖2 L ( I , w ) = 1 N ∑ i = 1 N ( m a x { d n ( I i o , I i + , I i − , w ) , τ 1 } ⏟ i n t e r − c l a s s − c o n s t r a i n t + β m a x { d p ( I i o , I i + , I i − , w ) , τ 2 } ⏟ i n t r a − c l a s s − c o n s t r a i n t ) d ( ϕ w ( I i o ) , ϕ w ( I i + ) ) = ‖ ϕ w ( I i o ) − ϕ w ( I i + ) ‖ 2

3.4. The Traning Algorithm

具体流程如下：

2016 CVPR-Person Re-Identification by Multi-Channel Parts-Based CNN with Improved Triplet Loss

4. Experiment

4.1. Setup

Data augmentation: resize图片到100x250，然后随机crop80x230并加上微小的随机扰动
Setting training parameters::
- 权重初始化:两个均值为0的高斯分布，一个方差为0.01,另一个方差为0.001
- 产生triplets:batch size=100，随机选五个人，对每个人随机生成20个triplets，相同对从类中选，不同对从剩下的类中选。
- τ1,τ2,β τ 1 , τ 2 , β 分别设置为-1，0.01，0.002
DataSets:
- i-LIDS
- PRID2011
- VIPeR
- CUHK01
Evaluation protocol: cumulative match curve(CMC) metric：CMC曲线Rank1识别率就是表示按照某种相似度匹配规则匹配后，第一次就能判断出正确的标签的数目与总的测试样本数目之比，Rank5识别率就是指前五项（按照匹配程度从大到小排列后）有正确匹配。如果一个样本按照匹配程度从大到小排列后，到最后一项，才匹配到正确标签

4.2.Experiment Evaluations

通过4个变体来说明本文提出方法的有效性：
- Variant1(T):去除了4个body-part通道并使用原始的triplet loss
- Variant2(TC):相比T，使用了改进的triplet loss
- Variant3(TP):使用五个通道与原始的triplet loss
- Variant4(TPC):相比TP，使用了改进的triplet loss
具体结果如下：

2016 CVPR-Person Re-Identification by Multi-Channel Parts-Based CNN with Improved Triplet Loss

2016 CVPR-Person Re-Identification by Multi-Channel Parts-Based CNN with Improved Triplet Loss

2016 CVPR-Person Re-Identification by Multi-Channel Parts-Based CNN with Improved Triplet Loss
对于较大的CUHK01，对上面四个变体每个通道各多加了一个卷积层

2016 CVPR-Person Re-Identification by Multi-Channel Parts-Based CNN with Improved Triplet Loss
交叉验证对 β β 进行了选择，结果如下：

2016 CVPR-Person Re-Identification by Multi-Channel Parts-Based CNN with Improved Triplet Loss

4.3.Analysis of different body parts

对于不同身体部分贡献程度的比较：训练了4个网络每个网络由full-body通道以及4个body-part的一个组成，结果如下图：

2016 CVPR-Person Re-Identification by Multi-Channel Parts-Based CNN with Improved Triplet Loss
对卷积层学习到的特征图进行了可视化，可以看到full-body通道捕捉到了全局信息，part-body捕捉到了局部的细节信息

2016 CVPR-Person Re-Identification by Multi-Channel Parts-Based CNN with Improved Triplet Loss

5.Conclusion

总结了本文提出的框架对于联合学习全局与局部细节特征的作用，以及改进的triplet loss能使类内距离更近、类间距离更远的效果
本文的方法在大多数数据集上取得了SOTA的性能
将来工作：将我们的方法应用在图像以及视频检索问题上

2016 CVPR-Person Re-Identification by Multi-Channel Parts-Based CNN with Improved Triplet Loss

Motivation

Contribution

1.Introduction

2.Related Work

3.The Proposed Person Re-Id Method

3.1. The Overall Framework

3.2. Multi-Channel Parts-based CNN Model

3.3. Improved Triplet Loss Function

3.4. The Traning Algorithm

4. Experiment

4.1. Setup

4.2.Experiment Evaluations

4.3.Analysis of different body parts

5.Conclusion

继续阅读

考证大全 | 证券从业资格考试

敲黑板！2021年证券从业考试考点预测

2021年银行从业考试考情介绍,果断收藏!

证券从业合格证书什么时候打印？有哪些注意事项？

【干货满满】初级银行从业考试《个人理财》重点梳理

2020年经济师考试，难吗？

初级银行从业资格证有什么用？

MBA提前面试纯干货分享

MBA值得学么

论文阅读笔记（三）：Research on Network Attack Effect Evaluation Based on Confrontational Perspective一. 论文简介二. 创新点和贡献：三. 相关领域的概述(related work)四. 作者的方案五. 主要的信息流（approach）六. 总结

吴恩达logistic回归实现

【人工智能行业大师访谈1】吴恩达采访 Geoffery Hinton

深度学习模型分析人类复杂疾病的准确性

【趋高机器视觉】机器视觉技术原理解析及解决方案

解码器用于语义分割：数据依赖的解码可以实现灵活的特征聚合

cs231n斯坦福基于卷积神经网络的CV学习笔记（一）KNN和线性分类器/分类器损失/反向传播一，KNN图像分类算法二，线性分类器三，线性分类器损失四，反向传播五，神经网络