論文筆記：Very Deep Convolutional Networks for Large-Scale Image Recognition

前言

論文：Very Deep Convolutional Networks for Large-Scale Image Recognition

一、INTRODUCTION

作者在這篇論文中提出了如何解決卷積網絡深度的問題。作者先是固定好模型的其他部分參數，然後通過增加一些卷積層來逐漸增大網絡的深度，而且作者的卷積層使用的是非常小的卷積核

In this paper, we address another important aspect of ConvNet architecture design – its depth. To this end, we fix other parameters of the architecture, and steadily increase the depth of the network by adding more convolutional layers, which is feasible due to the use of very small (3 × 3) convolution filters in all layer

模型最終的performance是非常好的，不僅在分類任務中表現達到了state-of-the-art，在location任務中效果也是非常好的

As a result, we come up with significantly more accurate ConvNet architectures, which not only achieve the state-of-the-art accuracy on ILSVRC classification and localisation tasks, but are also applicable to other image recognition datasets, where they achieve excellent performance even when used as a part of a relatively simple pipelines

二、CONVNET CONFIGURATIONS

在訓練過程中，模型的輸入被固定為224x224的三通道

During training, the input to our ConvNets is a fixed-size 224 × 224 RGB image

作者使用了非常小的3x3的卷積核，這也是其模型特色之一

we use filters with a very small receptive field: 3 × 3

作者還使用了1x1的卷積核，1x1的卷積核可以把它看作是一個線性變換

In one of the configurations we also utilise 1 × 1 convolution filters, which can be seen as a linear transformation of the input channels

如果某個卷積層的卷積核為3x3，那麼其padding将其設定為1

the padding is 1 pixel for 3 × 3 conv. layers

池化層使用的是2x2的pool切步長為2

Max-pooling is performed over a 2 × 2 pixel window, with stride 2

卷積層之後有三個全連接配接層，前兩層隐藏層單元數為4096個，第三次隐藏單元數為4096

A stack of convolutional layers (which has a different depth in different architectures) is followed by three Fully-Connected (FC) layers: the first two have 4096 channels each, the third performs 1000- way ILSVRC classification and thus contains 1000 channels (one for each class

該網絡使用的激活函數均為relu

All hidden layers are equipped with the rectification

三、CONFIGURATIONS

這篇論文設計了（A-E）個模型，這些模型基本上參數都是一緻的，不一樣的就是深度了，模型A有11層（8個卷積3個全連接配接）到模型E有19層（16個卷積層3個全連接配接層），依次遞增，如下

論文筆記：Very Deep Convolutional Networks for Large-Scale Image Recognition
作者提出模型的顯著特點就是卷積核特别小，但也正是因為卷積核變小了，才使得網絡可以變得更深。因為2層3x3的卷積層的感受野實際上就相當于一層5x5的卷積核的感受野。3層3x3的卷積層感受野相當于1層7x7卷積核的感受野
為什麼要使用三個3x3的卷積層而不直接使用1個7x7的卷積核的呢？第一個原因就是将一層non-linear分解成三層non-linear，可以增強網絡的表達能力。第二個原因就是這樣做可以減少網絡的參數，如果使用前者替換後者，可以減少大概19%的參數。

First, we incorporate three non-linear rectification layers instead of a single one, which makes the decision function more discriminative. Second, we decrease the number of parameters: assuming that both the input and the output of a three-layer 3 × 3 convolution stack has C channels, the stack is parametrised by 3(32)C2= 27C^2 weights; at the same time, a single 7 × 7 conv. layer would require 72C2= 49C^2parameters, i.e. 81% more

也可以把他們看成是一種正則化，因為網絡必須把一層分解成三層。

This can be seen as imposing a regularisation on the 7 × 7 conv. filters, forcing them to have a decomposition through the 3 × 3 filters (with non-linearity injected in between).

其實小卷積核的網絡在之前就有人做了，但是他們的網絡不是很深，直到2014年，Goodfellow使用了11層的網絡才表明增加網絡深度确實可以提高performance

Goodfellow et al. (2014) applied deep ConvNets (11 weight layers) to the task of street number recognition, and showed that the increased depth led to better performance

四、CLASSIFICATION FRAMEWORK

在訓練過程中，作者使用的biathsize為256，并且momentum的mu=0.9

The batch size was set to 256, momentum to 0.9.

網絡在卷積層中使用了L2正則，在全連接配接層使用了dropout，并且keep-prop=0.5

The training was regularised by weight decay (the L2penalty multiplier set to 5·10−4) and dropout regularisation for the first two fully-connected layers (dropout ratio set to 0.5)

學習率被初始化為0.01，當驗證集上的準确率下降時，學習率開始衰減

The learning rate was initially set to 10−2, and then decreased by a factor of 10 when the validation set accuracy stopped improving

作者在訓練的過程中，發現它的網絡比别人提出深度較少的網絡訓練收斂更快，作者猜測可能是因為将較大卷積核分解成三層較小卷積核帶有正則化作用

the nets required less epochs to converge due to (a) implicit regularisation imposed by greater depth and smaller conv. filter sizes; (b) pre-initialisation of certain layers

網絡參數的初始化工作上非常重要的，一個糟糕的初始化會讓網絡無法學習。為了規避這個問題，作者使用了預訓練技術，因為模型A的深度要稍微小一點，相對好訓練一些，是以先随機初始化模型A然後進行訓練，完成訓練後在使用模型A中的參數對模型E進行初始化。不過作者也提到了，使用Xavier初始化可以替代預訓練過程。

五、CLASSIFICATION EXPERIMENTS

論文筆記：Very Deep Convolutional Networks for Large-Scale Image Recognition

從圖中可以看出用不用LRN實際上對performance影響不大
随着網絡的深度的增加，模型的performance越來越好。對比Ｃ模型和D模型可以看出使用1x1的的卷積核沒有使用3x3的卷積核效果好。但是對比B和Ｃ可以看出，1x1的卷積核效果還是更好一些的
當網絡深度到19層時，其performance就陷入了瓶頸了，可能是因為資料集size的影響。作者推測，如果資料集足夠多的話，網絡越深越好。
作者嘗試對比13層的B模型和将用5x5作替換的8層B‘模型，最終結果是後者在top-err上比前者高了7%，這就說明了，将一層擁有較大卷積核的卷積層分解成擁有較小卷積核的卷積層确實可以提高performance

論文筆記：Very Deep Convolutional Networks for Large-Scale Image Recognition

後來作者嘗試綜合7個模型的預測機率，最終test-error達到了7%，如果綜合兩

個最好的模型的預測機率，最終的test-error可達6，8%。

論文筆記：Very Deep Convolutional Networks for Large-Scale Image Recognition

作者又将其模型與業界其他的成果進行對比，略輸于GoogleNet，但是比之前幾年ILSVRC比賽中第一名的模型都要好一些。如果隻看單個模型的performance，VGG效果略好于單個的GoogleLeNet

論文筆記：Very Deep Convolutional Networks for Large-Scale Image Recognition

前言

一、INTRODUCTION

二、CONVNET CONFIGURATIONS

三、CONFIGURATIONS

四、CLASSIFICATION FRAMEWORK

五、CLASSIFICATION EXPERIMENTS

繼續閱讀

人工智能如何有效地運用于自然語言處理

新聞 | Mapbox 牽手阿裡，飛豬旅行上線六大城市地圖功能

PHP進階學習之session寫入資料庫

【趨高機器視覺】機器視覺技術原了解析及解決方案

吳恩達 coursera ML 第七課總結+作業答案前言目錄正文模型表示作業答案

CSMA/CD1． CSMA/CD的概述2． CSMA 的工作原理3． CSMA/CD控制規程及特點4． CSMA/CD協定5． CSMA/CD的優點6．結束語

[HTML5]自定義屬性 data-* 和 jQuery.data 詳解

解碼器用于語義分割：資料依賴的解碼可以實作靈活的特征聚合

2021-2025年中國運動療法（KT）帶行業市場供需與戰略研究報告

cs231n斯坦福基于卷積神經網絡的CV學習筆記（一）KNN和線性分類器/分類器損失/反向傳播一，KNN圖像分類算法二，線性分類器三，線性分類器損失四，反向傳播五，神經網絡

2021年危險化學品經營機關安全管理人員考試題庫及危險化學品經營機關安全管理人員考試技巧

手動安裝Intel network I217-LM網卡的Linux驅動

XX系統實施過程問題總結

nginx 安裝錯誤資訊解決

無人機--飛控科普

GitHub連夜封殺！這份阿裡 10W 字内部 Java 字面試手冊到底有多強？