在softmax回歸中,假設我們的訓練集由m個已标記樣本組成:\[\{ ({x^{(1)}},{y^{(1)}}),...,({x^{(m)}},{y^{(m)}})\} \]且激活函數為softmax函數:\[p({y^{(i)}} = j|{x^{(i)}};\theta ) = \frac{{{e^{ - {\theta _j}^T{x^{(i)}}}}}}{{\sum\limits_{l = 1}^k {{e^{ - {\theta _l}^T{x^{(i)}}}}} }}\]損失函數為:\[J(\theta ) = - \frac{1}{m}\sum\limits_{i,j = 1}^m {[I({y^{(i)}} = j)logp({y^{(i)}} = j|{x^{(i)}};\theta )]} \]其中,\[{I({y^{(i)}} = j)}\]為示性函數
這裡,損失函數對參數的梯度的第t個分量應該分為兩種情況考慮(因為待求的分量t可能與softmax函數分子中的(第j個)參數一緻,也可能不一緻):
t = j 時:\[\begin{gathered}
{\nabla _{{\theta _t}}}J(\theta ) &= & - \frac{1}{m}\sum\limits_{i = 1}^m {[\frac{1}{{p({y^{(i)}} = j|{x^{(i)}};\theta )}} \cdot } p({y^{(i)}} = j|{x^{(i)}};\theta ) \cdot (1 - p({y^{(i)}} = j|{x^{(i)}};\theta )) \cdot {x^{(i)}}] \\
&=& - \frac{1}{m}\sum\limits_{i = 1}^m {[1 - p({y^{(i)}} = j|{x^{(i)}};\theta )] \cdot {x^{(i)}}} \\
\end{gathered} \]
t ≠ j 時: \[\begin{gathered}
{\nabla _{{\theta _t}}}J(\theta ) &=& - \frac{1}{m}\sum\limits_{i = 1}^m {[\frac{1}{{p({y^{(i)}} = j|{x^{(i)}};\theta )}} \cdot } \frac{{0 \cdot \left( {\sum\limits_{l = 1}^k {{e^{ - {\theta _l}^T{x^{(i)}}}}} } \right) - {e^{ - {\theta _l}^T{x^{(i)}}}} \cdot {e^{ - {\theta _l}^T{x^{(i)}}}} \cdot {x^{(i)}}}}{{{{(\sum\limits_{l = 1}^k {{e^{ - {\theta _l}^T{x^{(i)}}}}} )}^2}}}] \\
&= & - \frac{1}{m}\sum\limits_{i = 1}^m {[ - p({y^{(i)}} = j|{x^{(i)}};\theta )] \cdot {x^{(i)}}} \\
\end{gathered} \]
總結上面兩種情況可以得到(這裡重新表示為第j個參數的梯度):\[{\nabla _{{\theta _j}}}J(\theta ) = - \frac{1}{m}\sum\limits_{i = 1}^m {[I({y^{(i)}} = j) - p({y^{(i)}} = j|{x^{(i)}};\theta )] \cdot {x^{(i)}}} \]
梯度結果中,前面一項為誤差項,後面一項為輸入項,符合δ準則。并且容易看出,softmax激活函數和logistic激活函數最後求出來的梯度具有完全相同的形式。
重新将logistic函數和softmax函數表示為:\[\begin{gathered}
{f_1}(z) = \frac{1}{{1 + {e^{ - z}}}} \\
{f_2}(z) = \frac{{{e^{ - z}}}}{{\sum\limits_{i = 1}^m {{e^{ - {z_i}}}} }} \\
\end{gathered} \]
分别對它們求導:\[\begin{gathered}
{f_1}^{'}(z) &=& - \frac{{{e^{ - z}} \cdot ( - 1)}}{{{{(1 + {e^{ - z}})}^2}}} = \frac{{({e^{ - z}} + 1) - 1}}{{{{(1 + {e^{ - z}})}^2}}} \\
&=& {f_1}(z) - {f_1}^2(z) = {f_1}(z) \cdot (1 - {f_1}(z)) \\
\end{gathered} \]
\[\begin{gathered}
{f_2}^{'}(z) &=& - \frac{{{e^{ - z}} \cdot ( - 1) \cdot (\sum\limits_{i = 1}^m {{e^{ - {z_i}}}} ) - {e^{ - z}} \cdot ( - 1) \cdot {e^{ - z}}}}{{{{(\sum\limits_{i = 1}^m {{e^{ - {z_i}}}} )}^2}}} \\
&=& {f_2}(z) - {f_2}^2(z) = {f_2}(z) \cdot (1 - {f_2}(z)) \\
\end{gathered} \]
易看出,logistic函數和softmax函數的求導形式也完全一樣。
根據以上分析可以看出,softmax激活函數是logistic激活函數在多類别問題上的推廣(這部分容易了解,這裡不做更多說明),并且兩者具有完全相同的梯度形式和求導形式。