caffe中的softmax layer

在caffe中的lenet實作最後一層是softmax layer，輸出分類的結果，下面就簡單介紹一下softmax回歸。

1，首先，在caffe中，softmax layer輸出的是原始的輸入在每一個分類标簽上的機率，例如在lenet網絡中，輸出的分類共有0-9的10中分類，那麼softmax layer就會輸出含有10個元素的向量，每一個元素表示輸入在每一個分類上的機率。

那麼，用softmax的目的是讓輸入在正确的分來标簽上的機率最大，這就是我們優化的目标函數，普通的優化函數隻是比較輸出和标簽的內插補點，然後對內插補點進行優化，讓其最小，就可以得出網絡的參數值。但是在分類中，這種方法不太适用，因為輸出的結果不是連續的，而是離散的，是無法對其求梯度的，是以要從機率的角度進行考慮。

那麼問題來了，從機率的角度進行優化是可行的，但是輸入在每個标簽上的機率應該怎麼求呢，這是一個多元分布問題，而softmax就是解決多元分布問題的，在介紹具體的過程之前，我們先看softmax layer的輸入代表的含義，在《deeping learning》的181頁有這麼一句話:we can think of a as a vector of scores whose elements a(i) as associated with each category i, with larger relative scores yielding exponentially larger probabilities. 就是說softmax layer的輸入可以看作是輸入在每個标簽上的打分，分數越高，說明輸入越有可能屬于這個标簽上，那麼我們也可以利用這個分數求輸入相對于每個标簽的機率，分數越高，機率越大。

2， softmax回歸

網上有很多關于softmax回歸的文章，我的了解是softmax本質的作用就是計算softmax layer的輸入在每一個标簽上的機率，caffe中softmax_layer的過程如下：

（1）找出輸入的最大值；

（2）輸入的每一個變量都減去最大值；

（3）對（2）中結果求去指數函數；

（4）對（3）中結果歸一化，得出的結果就是輸入在每一個标簽上機率。

caffe中代碼是在softmax_layer.cpp Line 37

for (int i = 0; i < outer_num_; ++i) { // use softmax to calculate the hypothesis function
    // initialize scale_data to the first plane
    caffe_copy(inner_num_, bottom_data + i * dim, scale_data);
    for (int j = 0; j < channels; j++) {
      for (int k = 0; k < inner_num_; k++) {
        scale_data[k] = std::max(scale_data[k],
            bottom_data[i * dim + j * inner_num_ + k]);
      }
    }
    // subtraction
    caffe_cpu_gemm<Dtype>(CblasNoTrans, CblasNoTrans, channels, inner_num_,
        1, -1., sum_multiplier_.cpu_data(), scale_data, 1., top_data);
    // exponentiation
    caffe_exp<Dtype>(dim, top_data, top_data);
    // sum after exp
    caffe_cpu_gemv<Dtype>(CblasTrans, channels, inner_num_, 1.,
        top_data, sum_multiplier_.cpu_data(), 0., scale_data);
    // division
    for (int j = 0; j < channels; j++) {
      caffe_div(inner_num_, top_data, scale_data, top_data);
      top_data += inner_num_;
    }
  }

下面最重要的是求去代價函數，代價函數本質上就是輸入在對應的正确的标簽上的機率，我們優化的目标就是使這個機率最大，一般我們都是對機率取個log運算，然後，由于網絡的訓練是以batch為機關的，假如一個batch裡有100個樣本，我們就要對這100個樣本的log(機率)進行求和，然後平均，最後我們就是利用這個代價函數求梯度，然後利用梯度更新權值。caffe中代碼是在softmax_loss_layer.cpp中Line 100

for (int i = 0; i < outer_num_; ++i){ // sample by sample 
    for (int j = 0; j < inner_num_; j++) {
      const int label_value = static_cast<int>(label[i * inner_num_ + j]);
      if (has_ignore_label_ && label_value == ignore_label_) {
        continue;
      }
      DCHECK_GE(label_value, 0);
      DCHECK_LT(label_value, prob_.shape(softmax_axis_));
      loss -= log(std::max(prob_data[i * dim + label_value * inner_num_ + j],
                           Dtype(FLT_MIN)));  // cost function
      ++count;
    }
  }
  top[0]->mutable_cpu_data()[0] = loss / get_normalizer(normalization_, count);

關于softmax的具體原理，參考下面兩個部落格：

（1）http://ufldl.stanford.edu/wiki/index.php/Softmax%E5%9B%9E%E5%BD%92

（2）http://blog.csdn.net/acdreamers/article/details/44663305

caffe中的softmax layer

繼續閱讀

【caffe】讀取lmdb檔案中的内容

算法工程師校招攻略

場景文本檢測，CTPN tensorflow版本text-detection-ctpnpreparetraindemosome results

論文閱讀筆記20.05-第三周：ResNet的多種變種Residual Attention Network for Image ClassificationRes2Net: A New Multi-scale Backbone ArchitectureResNeSt: Split-Attention Networks

如何寫一篇好的科研論文背景我能夠從你的論文裡學到什麼？

Fast Spatio-Temporal Residual Network for Video Super-Resolution閱讀了解

Visual Attention

Tensorflow Day19 Denoising Autoencoder

Tensorflow Day16 Autoencoder 實作

Tensorflow Day17 Sparse Autoencoder

基于keras的多GPU深度學習網絡模型及參數儲存-筆記

A Guide For Time Series Prediction Using Recurrent Neural Networks (LSTMs)

ICLR 2017 | GAN Missing Modes 和 GAN

【深度學習-基礎知識】batchNormal原理及caffe中是如何使用的

Ubuntu16.04下Caffe環境搭建：cuda8.0 + opencv2.4.13

Ubuntu14.04+cuda8.0+caffe+MATLAB