原
part-aligned系列論文:1711.AlignedReID- Surpassing Human-Level Performance in Person Re-Id 論文閱讀
https://blog.csdn.net/xuluohongshang/article/details/79036440 xuluohongshang 閱讀數:2032
<span class="tags-box artic-tag-box">
<span class="label">标簽:</span>
<a data-track-click="{"mod":"popu_626","con":"partAlign"}" class="tag-link" href="http://so.csdn.net/so/search/s.do?q=partAlign&t=blog" target="_blank" rel="external nofollow" target="_blank">partAlign </a><a data-track-click="{"mod":"popu_626","con":"AlignReID"}" class="tag-link" href="http://so.csdn.net/so/search/s.do?q=AlignReID&t=blog" target="_blank" rel="external nofollow" target="_blank">AlignReID </a><a data-track-click="{"mod":"popu_626","con":"行人重識别"}" class="tag-link" href="http://so.csdn.net/so/search/s.do?q=行人重識别&t=blog" target="_blank" rel="external nofollow" target="_blank">行人重識别 </a><a data-track-click="{"mod":"popu_626","con":"論文筆記"}" class="tag-link" href="http://so.csdn.net/so/search/s.do?q=論文筆記&t=blog" target="_blank" rel="external nofollow" target="_blank">論文筆記 </a>
<span class="article_info_click">更多</span></span>
<div class="tags-box space">
<span class="label">個人分類:</span>
<a class="tag-link" href="https://blog.csdn.net/xuluohongshang/article/category/7001410" target="_blank" rel="external nofollow" target="_blank">行人重識别 </a>
</div>
</div>
<div class="operating">
</div>
</div>
</div>
</div>
<article class="baidu_pl">
<div id="article_content" class="article_content clearfix csdn-tracking-statistics" data-pid="blog" data-mod="popu_307" data-dsm="post">
<div class="article-copyright">
版權聲明:本文為部落客原創文章,未經部落客允許不得轉載。轉載請保留出處 https://blog.csdn.net/xuluohongshang/article/details/79036440 </div>
<div id="content_views" class="markdown_views prism-atom-one-dark">
<!-- flowchart 箭頭圖示 勿删 -->
<svg xmlns="http://www.w3.org/2000/svg" style="display: none;"><path stroke-linecap="round" d="M5,0 0,2.5 5,5z" id="raphael-marker-block" style="-webkit-tap-highlight-color: rgba(0, 0, 0, 0);"></path></svg>
<p><strong>AlignedReID- Surpassing Human-Level Performance in Person Re-ID</strong> <br>
論文資訊:
在本文中,我們提出了一種名為AlignedReID的新方法,它提取了一個與局部特征共同學習的全局特征。全局特征學習大大受益于局部特征學習,其通過計算兩組局部特征之間的最短路徑來執行對準/比對,而不需要額外的監督。在聯合學習之後,我們隻保留全局特征來計算圖像之間的相似性。我們的方法在Market1501上獲得了94.0%的一級準确率,在CUHK03上達到了96.1%,大幅超越了最先進的方法。我們還評估人類的表現,并證明我們的方法是第一個超越Market1501和CUHK03這兩個廣泛使用的Person ReID資料集的人類表現。
Market1501、CUHK03、MARS、CUHK-SYSU四個資料集上得到了驚豔的性能!
Resnet50 and Resnet50-Xception 作為基模型。
網友的一個複現代碼分享:
https://github.com/huanghoujing/AlignedReID-Re-Production-Pytorch
論文解析:
作者提出了一個AlignedReID網絡,如下:
訓練時的輸入:N個圖像(一個batch,作者用的N=128,每個ID四個圖像),會形成兩個N*N的ID距離矩陣
圖中顯示的為N個圖像的特征提取過程,圖像輸入的數量N,全局特征提取後采用L2距離得到度量N個圖像L2距離相似度矩陣,而上分支得到N個圖像,每個圖像7個條紋。每個條紋各特征向量長度為128,采用動态規劃由上及下比對來發現parts特征對齊下的最小總距離,每對樣本對計算最小總距離然後也形成一個N*N的最小總距離矩陣,這個最小總距離定義為兩個圖像的最短路徑如下圖,從(1,1)到(7,7)的距離和,直覺上可以知道當一對相似圖像會有更短的最小總距離,
Selecting suitable samples for the training model through hard mining has been shown to be effective.
這兩個矩陣一起被應用于triplet hard loss!!(這個loss采用自In defense of the triplet loss for person re-identification這篇論文),進而對全局特征與局部特征進行聯合訓練。而global距離矩陣還被用來指導選擇hard的三元組采樣 by hard sample mining according to global distances(之是以隻用全局距離而不用局部距離,主要是高效,且用兩個距離一起的話并沒重要差異)
可以知道,對于ID相同的兩個行人得到的最短總距離,作者采用的某一part到另一part的距離計算采用了L2距離的指數形式,進而使得non-corresponding alignment has a large L2 distance, and its gradient is close to zero (這個處理很好!)
i.e., the local distance between two images, is mostly determined by the corresponding alignments.
度量學習:度量學習除了采用triplet hard loss,作者還采用了互學習的loss來增強性能,可參考Deep mutual learning這篇論文。另外考慮了Combining softmax loss with metric learning loss to speed up the convergence is also a popular
method ,是以作者整合分類loss和度量loss:度量學習整體架構,如圖:
整體的loss包括了度量loss,度量互助loss,分類loss和分類互助loss,度量loss如前面的樣本挖掘架構中,由全局距離和局部距離一起計算決定,而互助loss僅由全局距離決定,其中分類互助loss注意在參考的互助學習文獻和本文中都用是KL散度loss。
Mutual Learning: presents a deep mutual learning strategy where an ensemble of students learn collaboratively and teach each other throughout the training process.
一個好的模型通常都是采用遷移學習的方法:預訓練一個模型然後在進行微調獲得自己的模型。這篇論文同時訓練多個模型,并讓它們互相學習
對齊學習:作者在resnet pool5 7*7 那裡,分兩路,上面那個對齊學習分支,均勻part分割,再用動态規劃,形成N*N距離矩陣(即水準的條紋區域各有N個,每一個為)。對于動态規劃中要求的最短路徑,用最短路徑來描述。局部特征距離是指通過動态規劃的方法求出的最短路徑,并通過該最短距離找到對齊的局部特征。
訓練過程中, 特征的學習分兩個分支,下面的分支學習全局特征,而上面的分支采用對特征圖的水準均勻分割後各part提取特征,學習特征對齊的過程,上面的分支學習過程中并不需要額外的标注資訊
測試的過程中,去掉上面的分支,直接提取全局特征,這時候的特征提取能自動進行各部分的對齊,這種基于局部特征學習得到的全局特征具有更好的ID辨識和比對性能,對遮擋、檢測框偏檢(如缺下半身)和多檢(如一個probe尺度大行人占據基本整個裁剪框空間,而另一個檢測框尺度小且包含了很多的背景)造成的兩圖像同一位置的part misalign,另外對于不同ID卻有相似的表觀也具有一定的識别力提升boosting!
是以研究part aligned極具價值的!!
實驗:作者還考慮應用re-ranking增強性能
實驗中用到了Resnet50 and Resnet50-Xception (Resnet-X) pre-trained on ImageNet [28] as the base models.
通過運用動态對準和協同學習,然後再重新排序,在兩個最為常用的ReID測試集Market1501和CUHK03上的首位命中率達到了94.0%和96.1%。據了解,這也是首次機器在行人再識别問題上超越人類專家表現,創下了業界紀錄。
關鍵點:
1)對齊 (8%)
2)mutual learning (3%)
3)classification loss, hard triplet同時
4)re-ranking (5~6%)
(1). some typical results of the alignment
(2). Compare our AlignedReID with a Baseline without local feature branch
The local feature branch helps the network focus on useful image regions and discriminates similar person images with subtle differences
但是作者發現一個不好解釋的現象:
if we apply the local distance together with the global distance in the inference stage, rank-1 accuracy further improves approximately 0.3% ∼ 0.5%. However, it is time consuming and not practical when searching in a large gallery. Hence, we recommend using the global feature only
(3). Analysis of Mutual Learning
(4). 在各資料集上與其他方法的比較Comparison with Other Methods
其中:RK是代表采用了Re-Ranking
(5). 與人類性能的比較:
(6). 給出了人類準确率低于AlignedReID的猜測:
First, the annotator usually summarizes some attributes, such as gender, age, and etc., to decide whether the imges contain the same person. Howeverm the summarized attributes might be incorrect.
Second, color bias exists between cameras, and it could make the same person looks differently in the query and ground truth images such as in (c).
Last, different camera angles and human poses might mislead the judgement of body shapes.
如圖,人類往往相對于本模型更容易出現的錯誤:
總結:
1.an implicit alignment of local features can substantially improve global feature learning.
2.the end-to-end learning with structure prior is more powerful than a “blind” end-to-end learning.
<script>
(function(){
function setArticleH(btnReadmore,posi){
var winH = $(window).height();
var articleBox = $("div.article_content");
var artH = articleBox.height();
if(artH > winH*posi){
articleBox.css({
'height':winH*posi+'px',
'overflow':'hidden'
})
btnReadmore.click(function(){
if(typeof window.localStorage === "object" && typeof window.csdn.anonymousUserLimit === "object"){
if(!window.csdn.anonymousUserLimit.judgment()){
window.csdn.anonymousUserLimit.Jumplogin();
return false;
}else if(!currentUserName){
window.csdn.anonymousUserLimit.updata();
}
}
articleBox.removeAttr("style");
$(this).parent().remove();
})
}else{
btnReadmore.parent().remove();
}
}
var btnReadmore = $("#btn-readmore");
if(btnReadmore.length>0){
if(currentUserName){
setArticleH(btnReadmore,3);
}else{
setArticleH(btnReadmore,1.2);
}
}
})()
</script>
</article>