天天看點

交叉熵 (Cross Entropy)交叉熵 (Cross Entropy)公式推導代碼實作

交叉熵 (Cross Entropy)

Machine Learning 很大一部分二分類任務的 loss function 使用 Cross Entrophy.,正常前面會有 s i g m o i d sigmoid sigmoid激活函數,具體前向和反先可以參考前面我的部落格:https://blog.csdn.net/jmu201521121021/article/details/86658163

公式

P ( y ∣ x ) = − 1 m ∑ i = 1 m y i ∗ l o g ( y ^ i ) + ( 1 − y i ) l o g ( 1 − y ^ i ) ( 1 ) P(y|x) = - \frac{1}{m} \sum_{i=1}^{m}y_i*log(\widehat y_i) + (1-y_i)log(1-\widehat y_i) \qquad (1) P(y∣x)=−m1​i=1∑m​yi​∗log(y

​i​)+(1−yi​)log(1−y

​i​)(1)

  • y i y_i yi​ 表示 i t h 樣 本 i^{th}樣本 ith樣本标注好的标簽
  • y ^ i \widehat y_i y

    ​i​ 表示 i t h i^{th} ith樣本輸出預測值

推導

  • y i ∈ [ 0 , 1 ] y_i \in [0,1] yi​∈[0,1],-1代表負類标簽,1代表正類标簽
  • when y = 1 y =1 y=1, P ( y ∣ x ) = y ^ P(y|x) = \widehat y P(y∣x)=y

    ​ 。when y = 0 , P ( y ∣ x ) = 1 − y ^ y=0,P(y|x)=1- \widehat y y=0,P(y∣x)=1−y

    ​,

  • 是以 可以寫成 :

    P ( y ∣ x ) = y ^ y ∗ ( 1 − y ^ ) 1 − y ( 2 ) P(y|x)=\widehat y^{y}*(1-\widehat y)^{1-y}\qquad (2) P(y∣x)=y

    ​y∗(1−y

    ​)1−y(2)

  • 由最大似然估計可得 :

    P ( y 1 , y 2 , . . . ∣ x 1 , x 2 , . . . ) = ∏ i = 1 m P ( y i ∣ x i ) = ∏ i = 1 m y ^ i y i ∗ ( 1 − y ^ i ) 1 − y i ( 3 ) P(y1,y2,...|x_1,x_2,...) = \prod_{i=1}^m P(y_i|x_i) = \prod_{i=1}^{m}\widehat y_i^{y_i}*(1-\widehat y_i)^{1-y_i}\qquad (3) P(y1,y2,...∣x1​,x2​,...)=i=1∏m​P(yi​∣xi​)=i=1∏m​y

    ​iyi​​∗(1−y

    ​i​)1−yi​(3)

  • 由于乘法對于後面反向傳播求導不友善求解,是以加個log,相乘變為相加,公式(3)變為 :

    P ( y ∣ x ) = ∑ i = 1 m y i ∗ l o g ( y ^ i ) + ( 1 − y i ) l o g ( 1 − y ^ i ) ( 4 ) P(y|x) = \sum_{i=1}^{m}y_i*log(\widehat y_i) + (1-y_i)log(1-\widehat y_i) \qquad (4) P(y∣x)=i=1∑m​yi​∗log(y

    ​i​)+(1−yi​)log(1−y

    ​i​)(4)

  • 為了讓loss表示小點,前面加個 1 m , 有 取 平 均 效 果 \frac{1}{m},有取平均效果 m1​,有取平均效果,由于目标是為了取到loss最小,前面再加個 − 1 -1 −1

    ,即公式(4)變為: P ( y ∣ x ) = − 1 m ∑ i = 1 m y i ∗ l o g ( y ^ i ) + ( 1 − y i ) l o g ( 1 − y ^ i ) ( 5 ) P(y|x) = - \frac{1}{m} \sum_{i=1}^{m}y_i*log(\widehat y_i) + (1-y_i)log(1-\widehat y_i) \qquad (5) P(y∣x)=−m1​i=1∑m​yi​∗log(y

    ​i​)+(1−yi​)log(1−y

    ​i​)(5)

代碼實作

# GRADED FUNCTION: compute_cost

def compute_cost(AL, Y):
    """
    Implement the cost function defined by equation (7).

    Arguments:
    AL -- probability vector corresponding to your label predictions, shape (1, number of examples)
    Y -- true "label" vector (for example: containing 0 if non-cat, 1 if cat), shape (1, number of examples)

    Returns:
    cost -- cross-entropy cost
    """
    
    m = Y.shape[1]

    # Compute loss from aL and y.
    ### START CODE HERE ### (≈ 1 lines of code)
    cost = -1.0 / m * np.sum( Y * np.log(AL) + (1 - Y) * np.log(1 - AL)) 
    ### END CODE HERE ###
    
    cost = np.squeeze(cost)      # To make sure your cost's shape is what we expect (e.g. this turns [[17]] into 17).
    assert(cost.shape == ())
    
    return cost
           

繼續閱讀