交叉熵 (Cross Entropy)

Machine Learning 很大一部分二分類任務的 loss function 使用 Cross Entrophy.，正常前面會有 s i g m o i d sigmoid sigmoid激活函數，具體前向和反先可以參考前面我的部落格：https://blog.csdn.net/jmu201521121021/article/details/86658163

公式

P ( y ∣ x ) = − 1 m ∑ i = 1 m y i ∗ l o g ( y ^ i ) + ( 1 − y i ) l o g ( 1 − y ^ i ) ( 1 ) P(y|x) = - \frac{1}{m} \sum_{i=1}^{m}y_i*log(\widehat y_i) + (1-y_i)log(1-\widehat y_i) \qquad (1) P(y∣x)=−m1i=1∑myi∗log(y

i)+(1−yi)log(1−y

i)(1)

y i y_i yi 表示 i t h 樣本 i^{th}樣本 ith樣本标注好的标簽
y ^ i \widehat y_i y

i 表示 i t h i^{th} ith樣本輸出預測值

推導

y i ∈ [ 0 , 1 ] y_i \in [0,1] yi∈[0,1]，-1代表負類标簽，1代表正類标簽
when y = 1 y =1 y=1, P ( y ∣ x ) = y ^ P(y|x) = \widehat y P(y∣x)=y

。when y = 0 , P ( y ∣ x ) = 1 − y ^ y=0,P(y|x)=1- \widehat y y=0,P(y∣x)=1−y

,
是以可以寫成 :

P ( y ∣ x ) = y ^ y ∗ ( 1 − y ^ ) 1 − y ( 2 ) P(y|x)=\widehat y^{y}*(1-\widehat y)^{1-y}\qquad (2) P(y∣x)=y

y∗(1−y

)1−y(2)
由最大似然估計可得 :

P ( y 1 , y 2 , . . . ∣ x 1 , x 2 , . . . ) = ∏ i = 1 m P ( y i ∣ x i ) = ∏ i = 1 m y ^ i y i ∗ ( 1 − y ^ i ) 1 − y i ( 3 ) P(y1,y2,...|x_1,x_2,...) = \prod_{i=1}^m P(y_i|x_i) = \prod_{i=1}^{m}\widehat y_i^{y_i}*(1-\widehat y_i)^{1-y_i}\qquad (3) P(y1,y2,...∣x1,x2,...)=i=1∏mP(yi∣xi)=i=1∏my

iyi∗(1−y

i)1−yi(3)
由于乘法對于後面反向傳播求導不友善求解，是以加個log，相乘變為相加，公式(3)變為 :

P ( y ∣ x ) = ∑ i = 1 m y i ∗ l o g ( y ^ i ) + ( 1 − y i ) l o g ( 1 − y ^ i ) ( 4 ) P(y|x) = \sum_{i=1}^{m}y_i*log(\widehat y_i) + (1-y_i)log(1-\widehat y_i) \qquad (4) P(y∣x)=i=1∑myi∗log(y

i)+(1−yi)log(1−y

i)(4)
為了讓loss表示小點，前面加個 1 m , 有取平均效果 \frac{1}{m},有取平均效果 m1,有取平均效果，由于目标是為了取到loss最小，前面再加個 − 1 -1 −1

，即公式(4)變為: P ( y ∣ x ) = − 1 m ∑ i = 1 m y i ∗ l o g ( y ^ i ) + ( 1 − y i ) l o g ( 1 − y ^ i ) ( 5 ) P(y|x) = - \frac{1}{m} \sum_{i=1}^{m}y_i*log(\widehat y_i) + (1-y_i)log(1-\widehat y_i) \qquad (5) P(y∣x)=−m1i=1∑myi∗log(y

i)+(1−yi)log(1−y

i)(5)

代碼實作

# GRADED FUNCTION: compute_cost

def compute_cost(AL, Y):
    """
    Implement the cost function defined by equation (7).

    Arguments:
    AL -- probability vector corresponding to your label predictions, shape (1, number of examples)
    Y -- true "label" vector (for example: containing 0 if non-cat, 1 if cat), shape (1, number of examples)

    Returns:
    cost -- cross-entropy cost
    """
    
    m = Y.shape[1]

    # Compute loss from aL and y.
    ### START CODE HERE ### (≈ 1 lines of code)
    cost = -1.0 / m * np.sum( Y * np.log(AL) + (1 - Y) * np.log(1 - AL)) 
    ### END CODE HERE ###
    
    cost = np.squeeze(cost)      # To make sure your cost's shape is what we expect (e.g. this turns [[17]] into 17).
    assert(cost.shape == ())
    
    return cost

交叉熵 (Cross Entropy)交叉熵 (Cross Entropy)公式推導代碼實作

交叉熵 (Cross Entropy)

公式

推導

代碼實作

繼續閱讀

簡單文檔分類——樸素貝葉斯算法樸素貝葉斯算法簡單文檔分類執行個體步驟總結樸素貝葉斯分類調用(sklearn)

考證大全 | 證券從業資格考試

敲黑闆！2021年證券從業考試考點預測

2021年銀行從業考試考情介紹,果斷收藏!

證券從業合格證書什麼時候列印？有哪些注意事項？

【幹貨滿滿】初級銀行從業考試《個人理财》重點梳理

2020年經濟師考試，難嗎？

初級銀行從業資格證有什麼用？

MBA提前面試純幹貨分享

MBA值得學麼

吳恩達logistic回歸實作

【人工智能行業大師訪談1】吳恩達采訪 Geoffery Hinton

深度學習模型分析人類複雜疾病的準确性

【趨高機器視覺】機器視覺技術原了解析及解決方案

解碼器用于語義分割：資料依賴的解碼可以實作靈活的特征聚合

cs231n斯坦福基于卷積神經網絡的CV學習筆記（一）KNN和線性分類器/分類器損失/反向傳播一，KNN圖像分類算法二，線性分類器三，線性分類器損失四，反向傳播五，神經網絡