Dropout: A Simple Way to Prevent Neural Networks from Overfitting
對于 dropout 層,在訓練時節點保留率(keep probability)為某一機率 p(0.5),在預測時(前向預測時)為 1.0;
傳統網絡:
z(ℓ+1)i=∑jw(ℓ+1)ij⋅y(ℓ)j+b(ℓ+1)i=w(ℓ+1)iy(ℓ)+b(ℓ+1)i
y(ℓ+1)i=f(z(ℓ+1)i)
而對于 dropout 型網絡:
r(ℓ)j∼Bernoulli(p)
y˜(ℓ)=r(ℓ)∗y(ℓ)
z(ℓ+1)i=∑jw(ℓ+1)ij⋅y˜(ℓ)j+b(ℓ+1)i=w(ℓ+1)iy˜(ℓ)+b(ℓ+1)i
由此可見 dropout 的應用應在 relu 等非線性激活函數之後,
-> CONV/FC -> BatchNorm -> ReLu(or other activation) -> Dropout -> CONV/FC ->;