cs231n - assignment1 - softmax 梯度推導

Softmax exercise

Complete and hand in this completed worksheet (including its outputs and any supporting code outside of the worksheet) with your assignment submission. For more details see the assignments page on the course website.

This exercise is analogous to the SVM exercise. You will:

- implement a fully-vectorized loss function for the Softmax classifier

- implement the fully-vectorized expression for its analytic gradient

- check your implementation with numerical gradient

- use a validation set to tune the learning rate and regularization strength

- optimize the loss function with SGD

- visualize the final learned weights

和linear_svm一樣，主要難點是求導操作，不過softmax的求導更簡單一些。

首先還是給出 Loss 的公式：

L=1N∑iLi+λR(W)−−−（1）

其中共有 N 個樣本，每個樣本帶來的 Loss 是 Li:

Li=−logpyi=−log⎛⎝efyi∑jefj⎞⎠=−fyi+log∑jefj−−−（2）

對于每一個樣本 Xi ，由于 softmax 的分母對所有的 fj 進行了累積求和，是以 Li 對 W 的導數對 W的每一列都又貢獻，即 ∂LiWj 對所有的 j 都不為 0:

當 j！=yi 時：

∂Li∂Wj=efj∑jefj∂fj∂Wj=efj∑jefjXTi−−−（3）

當 j==yi 時：

∂Li∂Wj=efj∑jefj∂fj∂Wj=efj∑jefjXTi−XTi−−−（4）

對所有樣本都求出對應的Loss, 累積求和，并加上正則項即可以得到最終要求的Loss了。

上面求導數過程是把 Loss 對于 W 的導數顯示的寫出來，然後直接對 W 求導數，在這個簡單的例子中可以這樣，但是一旦網絡變得複雜了，就很難直接寫出Loss 對于要求的表達式的導數了。一種比較好的方式是利用 chain rule 逐級的求導數：

pk=efk∑jefj,Li=−logpyi−−−（5）

這裡 fk 是 softmax 層的輸出，由上面公式 (2）可以求出 Loss 對 fk 的導數為：

∂Li∂fk=pk−1(yi=k)−−−（6）

該式子表明 Loss 對 softmax 層的輸出的導數為 pk ，并且當 k= yi 時導數項還要減去1。

把式（6）改寫為向量形式：

∂Li∂f=p−[0...1...](第yi維為1)−−−（6a）

現在考慮第二層 fully connected layer，也就是緊連着 softmax 的那一層全連接配接層，這一層的輸入是隐藏層的輸出 hiddenlayer[1×H] ，是以softmax的輸入 f=hidden_layer.dot(W2)+b2 ，檢查一下次元， f 為 C 維向量， W2 為 H×C 的二維矩陣， b2 為 C 維向量，沒問題。現在就可以來求 f 對 W2 的導數了：

∂f∂W=hidden_layer.T(次元為H×1)−−−（7）

可以看到， ∂f∂W 的是全連接配接層的輸入向量。

綜合以上結果就可以求得：

∂Li∂W=∂f∂W∂Li∂f−−−(8)

最後對所有 N 個樣本寫成矩陣形式：

∂L∂f=p[N×C]−MaskMat[N×C]−−−（6m）

∂f∂W=hiddenlayerT[N×H]−−−（7m）

∂L∂W=∂f∂W∂L∂f[H×C]−−−(8m)

其中(6m)中的 MskMat為 N <script type="math/tex" id="MathJax-Element-351">N</script> 個(6a)中向量組成，具體形式可以參見如下python代碼：

# compute the gradient on scores
  dscores = probs
  dscores[range(num_examples),y] -=

# softmax.py
import numpy as np
from random import shuffle

def softmax_loss_naive(W, X, y, reg):
  """
  Softmax loss function, naive implementation (with loops)

  Inputs have dimension D, there are C classes, and we operate on minibatches
  of N examples.

  Inputs:
  - W: A numpy array of shape (D, C) containing weights.
  - X: A numpy array of shape (N, D) containing a minibatch of data.
  - y: A numpy array of shape (N,) containing training labels; y[i] = c means
    that X[i] has label c, where 0 <= c < C.
  - reg: (float) regularization strength

  Returns a tuple of:
  - loss as single float
  - gradient with respect to weights W; an array of same shape as W
  """
  # Initialize the loss and gradient to zero.
  loss = 
  dW = np.zeros_like(W)

  #############################################################################
  # TODO: Compute the softmax loss and its gradient using explicit loops.     #
  # Store the loss in loss and the gradient in dW. If you are not careful     #
  # here, it is easy to run into numeric instability. Don't forget the        #
  # regularization!                                                           #
  #############################################################################
  num_train = X.shape[]
  num_classes = W.shape[]
  for i in xrange(num_train):
    scores = X[i].dot(W) 
    scores -= np.max(scores) #prevents numerical instability
    correct_class_score = scores[y[i]]

    exp_sum = np.sum(np.exp(scores))
    loss += np.log(exp_sum) - correct_class_score

    dW[:, y[i]] -= X[i]
    for j in xrange(num_classes):
      dW[:,j] += (np.exp(scores[j]) / exp_sum) * X[i]

  loss /= num_train
  loss +=  * reg * np.sum( W*W )
  dW /= num_train
  dW += reg * W

  #############################################################################
  #                          END OF YOUR CODE                                 #
  #############################################################################

  return loss, dW


def softmax_loss_vectorized(W, X, y, reg):
  """
  Softmax loss function, vectorized version.

  Inputs and outputs are the same as softmax_loss_naive.
  """
  # Initialize the loss and gradient to zero.
  loss = 
  dW = np.zeros_like(W)

  #############################################################################
  # TODO: Compute the softmax loss and its gradient using no explicit loops.  #
  # Store the loss in loss and the gradient in dW. If you are not careful     #
  # here, it is easy to run into numeric instability. Don't forget the        #
  # regularization!                                                           #
  #############################################################################
  num_train = X.shape[]
  num_classes = W.shape[]

  scores = X.dot(W)
  scores -= np.max(scores, axis = )[:, np.newaxis]
  exp_scores = np.exp(scores)
  sum_exp_scores = np.sum(exp_scores, axis = )
  correct_class_score = scores[range(num_train), y]

  loss = np.sum(np.log(sum_exp_scores)) - np.sum(correct_class_score)

  exp_scores = exp_scores / sum_exp_scores[:,np.newaxis]

  # maybe here can be rewroten into matrix operations 
  for i in xrange(num_train):
    dW += exp_scores[i] * X[i][:,np.newaxis]
    dW[:, y[i]] -= X[i]

  loss /= num_train
  loss +=  * reg * np.sum( W*W )
  dW /= num_train
  dW += reg * W
  #############################################################################
  #                          END OF YOUR CODE                                 #
  #############################################################################

  return loss, dW

# softmax.ipynb
# Use the validation set to tune hyperparameters (regularization strength and
# learning rate). You should experiment with different ranges for the learning
# rates and regularization strengths; if you are careful you should be able to
# get a classification accuracy of over 0.35 on the validation set.
from cs231n.classifiers import Softmax
results = {}
best_val = -
best_softmax = None
learning_rates = [, , ]
regularization_strengths = [, , ]

################################################################################
# TODO:                                                                        #
# Use the validation set to set the learning rate and regularization strength. #
# This should be identical to the validation that you did for the SVM; save    #
# the best trained softmax classifer in best_softmax.                          #
################################################################################
params = [(x,y) for x in learning_rates for y in regularization_strengths ]
for lrate, regular in params:
    softmax = Softmax()
    loss_hist = softmax.train(X_train, y_train, learning_rate=lrate, reg=regular,
                             num_iters=, verbose=True)
    y_train_pred = softmax.predict(X_train)
    accuracy_train = np.mean( y_train == y_train_pred)
    y_val_pred = softmax.predict(X_val)
    accuracy_val = np.mean(y_val == y_val_pred)
    results[(lrate, regular)] = (accuracy_train, accuracy_val)
    if(best_val < accuracy_val):
        best_val = accuracy_val
        best_softmax = softmax
################################################################################
#                              END OF YOUR CODE                                #
################################################################################

# Print out results.
for lr, reg in sorted(results):
    train_accuracy, val_accuracy = results[(lr, reg)]
    print 'lr %e reg %e train accuracy: %f val accuracy: %f' % (
                lr, reg, train_accuracy, val_accuracy)

print 'best validation accuracy achieved during cross-validation: %f' % best_val

cs231n - assignment1 - softmax 梯度推導

Softmax exercise

繼續閱讀

Coursera Machine Learning 學習筆記（十）

Coursera Machine Learning 學習筆記（二）

Coursera Machine Learning 學習筆記（十一）

Coursera Machine Learning 學習筆記（四）

Coursera Machine Learning 學習筆記（九）

誤差逆傳播計算

Seq2Seq模型中的貪心搜尋(Greedy Search)1. 引入2. 貪心算法：Greedy Search參考

機器學習的應用場景

機器學習 Machine Learning

Machine Learning Basis

Text Recognition with ML KitText Recognition with ML Kit

【吳恩達機器學習筆記】7支援向量機12支援向量機（Support Vector Machines）

scikit-learn中的SVM

ML - 貸款使用者逾期情況分析6 - Final思路

SVM支援向量機二（Lagrange Duality）SVM支援向量機二（Lagrange Duality）

cs231n斯坦福基于卷積神經網絡的CV學習筆記（一）KNN和線性分類器/分類器損失/反向傳播一，KNN圖像分類算法二，線性分類器三，線性分類器損失四，反向傳播五，神經網絡