天天看點

交叉熵損失(Cross Entropy)求導

版權聲明:本文為部落客原創文章,遵循 CC 4.0 BY-SA 版權協定,轉載請附上原文出處連結和本聲明。

本文連結:https://blog.csdn.net/chaipp0607/article/details/101946040

Cross Entropy是分類問題中常見的一種損失函數,我們在之前的文章提到過二值交叉熵的證明和交叉熵的作用,下面解釋一下交叉熵損失的求導。

首先一個模型的最後一層神經元的輸出記為f0...fif_{0}...f_{i}f0​...fi​,

輸出經過softmax激活之後記為p0...pip_{0}...p_{i}p0​...pi​,那麼:

pi=efi∑k=0C−1efkp_{i} = \frac{e^{f_{i}}}{\sum_{k=0}^{C-1} e^{f_{k}}}pi​=∑k=0C−1​efk​efi​​

類别的實際标簽記為y0...yiy_{0}...y_{i}y0​...yi​,那麼交叉熵損失L為:

L=−∑i=0C−1yilogpiL = -\sum_{i=0}^{C-1} y_{i}log^{p_{i}}L=−i=0∑C−1​yi​logpi​

上式中的logloglog是一種簡寫,為了後續的求導友善,一般我們認為logloglog的底是eee,即logloglog為lnlnln。

那麼LLL對第iii個神經元的輸出fif_{i}fi​求偏導∂L∂fi\frac{\partial L}{\partial f_{i}}∂fi​∂L​:

根據複合函數求導原則:

∂L∂fi=∑j=0C−1∂Lj∂pj∂pj∂fi\frac{\partial L}{\partial f_{i}} = \sum_{j=0}^{C-1} \frac{\partial L_{j}}{\partial p_{j}}\frac{\partial p_{j}}{\partial f_{i}}∂fi​∂L​=j=0∑C−1​∂pj​∂Lj​​∂fi​∂pj​​

在這裡需要說明,在softmax中我們使用了下标iii和kkk,在交叉熵中使用了下标iii,但是這裡的兩個iii并不等價,因為softmax的分母中包含了每個神經元的輸出fff,也就是激活後所有的ppp對任意的fif_{i}fi​求偏導都不為0,同時LLL中又包含了所有的ppp,是以為了避免重複我們需要為ppp引入一個新的下标jjj,jjj有0...C−10...C-10...C−1這C種情況。

那麼依次求導:

∂Lj∂pj=∂(−yjlogpj)∂(pj)\frac{\partial L_{j}}{\partial p_{j}}= \frac{\partial (-y_{j}log^{p_{j}})}{\partial (p_{j})}∂pj​∂Lj​​=∂(pj​)∂(−yj​logpj​)​

由于預設一般我們認為logloglog的底是eee,即logloglog為lnlnln,是以:

∂Lj∂pj=∂(−yjlogpj)∂(pj)=−yjpj\frac{\partial L_{j}}{\partial p_{j}}= \frac{\partial (-y_{j}log^{p_{j}})}{\partial (p_{j})} =-\frac{y_{j}}{p_{j}}∂pj​∂Lj​​=∂(pj​)∂(−yj​logpj​)​=−pj​yj​​

接着要求∂pj∂fi\frac{\partial p_{j}}{\partial f_{i}}∂fi​∂pj​​的值,在這裡可以發現,每一個pjp_{j}pj​中都包含fif_{i}fi​,是以∂pj∂fi\frac{\partial p_{j}}{\partial f_{i}}∂fi​∂pj​​都不是0,但是j=ij=ij=i和j≠ij \neq ij​=i的時候,∂pj∂fi\frac{\partial p_{j}}{\partial f_{i}}∂fi​∂pj​​結果又不相同,是以這裡需要分開讨論:

  • 首先j=ij=ij=i時:

    ∂pj∂fi=∂pi∂fi=∂efi∑k=0C−1efk∂fi\frac{\partial p_{j}}{\partial f_{i}} = \frac{\partial p_{i}}{\partial f_{i}} = \frac{\partial \frac{e^{f_{i}}}{\sum_{k=0}^{C-1} e^{f_{k}}}}{\partial f_{i}} ∂fi​∂pj​​=∂fi​∂pi​​=∂fi​∂∑k=0C−1​efk​efi​​​

    =(efi)′∑k=0C−1efk−efi(∑k=0C−1efk)′(∑k=0C−1efk)2= \frac{ (e^{f_{i}})' \sum_{k=0}^{C-1} e^{f_{k}} - e^{f_{i}}(\sum_{k=0}^{C-1} e^{f_{k}})' }{(\sum_{k=0}^{C-1} e^{f_{k}})^{2}} =(∑k=0C−1​efk​)2(efi​)′∑k=0C−1​efk​−efi​(∑k=0C−1​efk​)′​

    =efi∑k=0C−1efk−(efi)2(∑k=0C−1efk)2=efi∑k=0C−1efk−(efi∑k=0C−1efk)2= \frac{ e^{f_{i}}\sum_{k=0}^{C-1} e^{f_{k}} - (e^{f_{i}})^2 }{(\sum_{k=0}^{C-1} e^{f_{k}})^{2}}= \frac{ e^{f_{i}} }{\sum_{k=0}^{C-1} e^{f_{k}}} - (\frac{ e^{f_{i}} }{\sum_{k=0}^{C-1} e^{f_{k}}})^2=(∑k=0C−1​efk​)2efi​∑k=0C−1​efk​−(efi​)2​=∑k=0C−1​efk​efi​​−(∑k=0C−1​efk​efi​​)2

    =pi−(pi)2=pi(1−pi) = p_{i}-(p{i})^2 = p_{i}(1-p_{i})=pi​−(pi)2=pi​(1−pi​)

  • 然後j≠ij\neq ij​=i時:

    ∂pj∂fi=∂efj∑k=0C−1efk∂fi\frac{\partial p_{j}}{\partial f_{i}}= \frac{\partial \frac{e^{f_{j}}}{\sum_{k=0}^{C-1} e^{f_{k}}}}{\partial f_{i}} ∂fi​∂pj​​=∂fi​∂∑k=0C−1​efk​efj​​​

    =(efj)′∑k=0C−1efk−efj(∑k=0C−1efk)′(∑k=0C−1efk)2= \frac{ (e^{f_{j}})' \sum_{k=0}^{C-1} e^{f_{k}} - e^{f_{j}}(\sum_{k=0}^{C-1} e^{f_{k}})' }{(\sum_{k=0}^{C-1} e^{f_{k}})^{2}} =(∑k=0C−1​efk​)2(efj​)′∑k=0C−1​efk​−efj​(∑k=0C−1​efk​)′​

    =−efiefj(∑k=0C−1efk)2=−efi∑k=0C−1efkefj∑k=0C−1efk= \frac{ - e^{f_{i}} e^{f_{j}} }{(\sum_{k=0}^{C-1} e^{f_{k}})^{2}} = - \frac{ e^{f_{i}} }{\sum_{k=0}^{C-1} e^{f_{k}}} \frac{ e^{f_{j}} }{\sum_{k=0}^{C-1} e^{f_{k}}}=(∑k=0C−1​efk​)2−efi​efj​​=−∑k=0C−1​efk​efi​​∑k=0C−1​efk​efj​​

    =−pipj = -p_{i}p_{j}=−pi​pj​

對于最後的偏導數,需要把上述兩個部分加起來:

∂L∂fi=∑j=iC−1∂Lj∂pj∂pj∂fi+∑j≠iC−1∂Lj∂pj∂pj∂fi\frac{\partial L}{\partial f_{i}} = \sum_{j=i}^{C-1} \frac{\partial L_{j}}{\partial p_{j}}\frac{\partial p_{j}}{\partial f_{i}} + \sum_{j\neq i}^{C-1} \frac{\partial L_{j}}{\partial p_{j}}\frac{\partial p_{j}}{\partial f_{i}}∂fi​∂L​=j=i∑C−1​∂pj​∂Lj​​∂fi​∂pj​​+j​=i∑C−1​∂pj​∂Lj​​∂fi​∂pj​​

=−yipipi(1−pi)+∑j≠iC−1−pipj(−yjpj)=-\frac{y_{i}}{p_{i}}p_{i}(1-p_{i}) + \sum_{j\neq i}^{C-1}-p_{i}p_{j}(-\frac{y_{j}}{p_{j}})=−pi​yi​​pi​(1−pi​)+j​=i∑C−1​−pi​pj​(−pj​yj​​)

=−yi(1−pi)+∑j≠iC−1piyj=-y_{i}(1-p_{i}) + \sum_{j\neq i}^{C-1}p_{i}y_{j}=−yi​(1−pi​)+j​=i∑C−1​pi​yj​

=yipi−yi+∑j≠iC−1piyj=y_{i}p_{i}-y_{i} + \sum_{j\neq i}^{C-1}p_{i}y_{j}=yi​pi​−yi​+j​=i∑C−1​pi​yj​

在上式中,j≠ij\neq ij​=i的情況中剛好缺了j=ij=ij=i,是以可以繼續改寫為:

=∑j=0C−1piyj−yi=\sum_{j=0}^{C-1}p_{i}y_{j} - y_{i} =j=0∑C−1​pi​yj​−yi​

=pi∑j=0C−1yj−yi=p_{i}\sum_{j=0}^{C-1}y_{j} - y_{i} =pi​j=0∑C−1​yj​−yi​

而∑j=0C−1yj=1\sum_{j=0}^{C-1}y_{j} = 1∑j=0C−1​yj​=1,是以:

=pi∑j=0C−1yj−yi=pi−yi=p_{i}\sum_{j=0}^{C-1}y_{j} - y_{i} = p_{i}-y_{i} =pi​j=0∑C−1​yj​−yi​=pi​−yi​