DL之R-CNN：R-CNN算法的簡介(論文介紹)、架構詳解、案例應用等配圖集合之詳細攻略

R-CNN算法的簡介(論文介紹) 0、R-CNN算法流程圖 1、實驗結果 R-CNN算法的架構詳解 R-CNN算法的案例應用相關文章 DL之R-CNN：R-CNN算法的簡介(論文介紹)、架構詳解、案例應用等配圖集合之詳細攻略 DL之R-CNN：R-CNN算法的架構詳解 DL之FastR-CNN：Fast R-CNN算法的簡介(論文介紹)、架構詳解、案例應用等配圖集合之詳細攻略 DL之FasterR-CNN：Faster R-CNN算法的簡介(論文介紹)、架構詳解、案例應用等配圖集合之詳細攻略

R-CNN是用深度學習解決目标檢測問題的開山之作，2014年，第一次用深度學習來做傳統的目标檢測任務。

羅斯·格希克(

Ross Girshick

)是Facebook人工智能研究(FAIR)的一名研究科學家，緻力于計算機視覺和機器學習。2012年，他在Pedro Felzenszwalb的指導下獲得了芝加哥大學計算機科學博士學位。在加入FAIR之前，羅斯是微軟研究院(Microsoft Research)的研究員、雷德蒙德(Redmond)和加州大學伯克利分校(University of California, Berkeley)的博士後。他的興趣包括執行個體級對象了解和視覺推理挑戰，這些挑戰将自然語言處理和計算機視覺結合起來。他獲得了2017年PAMI青年研究員獎，并以開發用于目标檢測的R-CNN(基于區域的卷積神經網絡)方法而聞名。2017年，羅斯還憑借《面具R-CNN》獲得ICCV的Marr獎。

評價：RBG是這個領域神一樣的存在，後續的一些改進方法如Fast R-CNN、Faster R-CNN、YOLO等相關工作都和他有關。

Abstract

Object detection performance, as measured on the canonical PASCAL VOC dataset, has plateaued in the last few years. The best-performing methods are complex ensemble systems that typically combine multiple low-level image features with high-level context. In this paper, we propose a simple and scalable detection algorithm that improves mean average precision (mAP) by more than 30% relative to the previous best result on VOC 2012—achieving a mAP of 53.3%. Our approach combines two key insights: (1) one can apply high-capacity convolutional neural networks (CNNs) to bottom-up region proposals in order to localize and segment objects and (2) when labeled training data is scarce, supervised pre-training for an auxiliary task, followed by domain-specific fine-tuning, yields a significant performance boost. Since we combine region proposals with CNNs, we call our method R-CNN: Regions with CNN features. We also compare R-CNN to OverFeat, a recently proposed sliding-window detector based on a similar CNN architecture. We find that R-CNN outperforms OverFeat by a large margin on the 200-class ILSVRC2013 detection dataset. Source code for the complete system is available at

http://www.cs.berkeley.edu/

˜rbg/rcnn.

摘要

在過去的幾年中，通過标準的PASCAL VOC資料集測量，目标檢測性能已經趨于穩定。最有效的方法是複雜的內建系統，通常将多個低級圖像特征與進階上下文結合起來。在本文中，我們提出了一種簡單且可擴充的檢測算法，相對于之前VOC 2012的最佳結果，平均精度（MAP）提高了30%以上，實作了53.3%的MAP。我們的方法結合了兩個關鍵的觀點：（1）一種方法可以将大容量卷積神經網絡（CNN）應用于自下而上的區域方案，以便對對象進行定位和分段；（2）當标記的訓練資料不足時，為輔助任務進行有監督的預訓練，然後進行特定領域的微調，可以顯著提高性能。由于我們将region proposals與CNN結合起來，我們稱之為R-CNN方法：具有CNN特征的Regions。我們還将R-CNN與OverFeat 進行了比較，後者是一種基于類似CNN架構的滑動視窗探測器。我們發現，R-CNN在200-class ILSVRC2013檢測資料集上的優勢，遠遠超過了OverFeat 。完整系統的源代碼可在

http://www.cs.berkeley.edu/˜rbg/rcnn

上找到。

Conclusion

In recent years, object detection performance had stagnated. The best performing systems were complex ensembles combining multiple low-level image features with high-level context from object detectors and scene classifiers. This paper presents a simple and scalable object detection algorithm that gives a 30% relative improvement over the best previous results on PASCAL VOC 2012.

We achieved this performance through two insights. The first is to apply high-capacity convolutional neural networks to bottom-up region proposals in order to localize and segment objects. The second is a paradigm for training large CNNs when labeled training data is scarce. We show that it is highly effective to pre-train the network— with supervision—for a auxiliary task with abundant data (image classification) and then to fine-tune the network for the target task where data is scarce (detection). We conjecture that the “supervised pre-training/domain-specific finetuning” paradigm will be highly effective for a variety of data-scarce vision problems.

We conclude by noting that it is significant that we achieved these results by using a combination of classical tools from computer vision and deep learning (bottomup region proposals and convolutional neural networks). Rather than opposing lines of scientific inquiry, the two are natural and inevitable partners.

結論

近年來，目标檢測性能停滞不前。性能最好的系統是将多個低級圖像特征與來自對象檢測器和場景分類器的進階上下文相結合的複雜內建。本文提出了一種簡單且可擴充的目标檢測算法，該算法比之前在PASCAL VOC 2012上獲得的最佳結果有30%的相對改進。

我們通過兩個視角來實作這一性能。第一種方法是将大容量卷積神經網絡應用于自下而上的區域方案，以實作目标的定位和分段。第二種模式是在标記訓練資料稀缺的情況下訓練大型CNN。結果表明，對一個資料豐富的輔助任務（圖像分類）進行預訓練，然後對資料稀缺的目标任務（檢測）進行網絡微調，是一種非常有效的方法。我們推測，“有監督的預訓練/特定區域微調”範式對于各種資料稀缺的視覺問題将非常有效。

最後，我們注意到，我們通過結合計算機視覺和深度學習(自下而上的區域建議和卷積神經網絡)的經典工具，取得了這些成果，這是非常重要的。兩者不是對立的科學探究路線，而是自然的、不可避免的合作夥伴。

論文

Ross Girshick, Jeff Donahue, Trevor Darrell, Jitendra Malik.

Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation. CVPR 2014

https://arxiv.org/abs/1311.2524v3

Ross Girshick, Jeff Donahue, Trevor Darrell, and Jitendra Malik(2014):Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation. In 580–587.

《

Rich feature hierarchies for accurate object detection and semantic segmentation Tech report (v3)

》

DL之R-CNN：R-CNN算法的簡介(論文介紹)、架構詳解、案例應用等配圖集合之詳細攻略

1、Detection average precision (%) on VOC 2010 test

R-CNN BB算法(加了BBox回歸技巧)，前邊20列是20個分類的每個AP，最後一列是平均，mAP達到53.7！

DL之R-CNN：R-CNN算法的簡介(論文介紹)、架構詳解、案例應用等配圖集合之詳細攻略

2、ILSVRC2013 detection test mAP

即在ImageNet上的測試結果，

DL之R-CNN：R-CNN算法的簡介(論文介紹)、架構詳解、案例應用等配圖集合之詳細攻略

更新……

DL之R-CNN：R-CNN算法的簡介(論文介紹)、架構詳解、案例應用等配圖集合之詳細攻略

繼續閱讀

241 Different Ways to Add Parentheses（C代碼版）

【趨高機器視覺】機器視覺技術原了解析及解決方案

CSMA/CD1． CSMA/CD的概述2． CSMA 的工作原理3． CSMA/CD控制規程及特點4． CSMA/CD協定5． CSMA/CD的優點6．結束語

極大似然法(ML)與最大期望法(EM)

[HTML5]自定義屬性 data-* 和 jQuery.data 詳解

解碼器用于語義分割：資料依賴的解碼可以實作靈活的特征聚合

2021-2025年中國運動療法（KT）帶行業市場供需與戰略研究報告

cs231n斯坦福基于卷積神經網絡的CV學習筆記（一）KNN和線性分類器/分類器損失/反向傳播一，KNN圖像分類算法二，線性分類器三，線性分類器損失四，反向傳播五，神經網絡

C++ 第十五周報告1--《冒泡法排序》

2021年危險化學品經營機關安全管理人員考試題庫及危險化學品經營機關安全管理人員考試技巧

筆試面試題目：滑動視窗(二)

資料結構與算法（27）——排序（二）

無人機--飛控科普

Dijkstra--簡易版（最短路徑）

GitHub連夜封殺！這份阿裡 10W 字内部 Java 字面試手冊到底有多強？

hdu7108哈希