交叉熵 (Cross Entropy)
Machine Learning 很大一部分二分類任務的 loss function 使用 Cross Entrophy.,正常前面會有 s i g m o i d sigmoid sigmoid激活函數,具體前向和反先可以參考前面我的部落格:https://blog.csdn.net/jmu201521121021/article/details/86658163
公式
P ( y ∣ x ) = − 1 m ∑ i = 1 m y i ∗ l o g ( y ^ i ) + ( 1 − y i ) l o g ( 1 − y ^ i ) ( 1 ) P(y|x) = - \frac{1}{m} \sum_{i=1}^{m}y_i*log(\widehat y_i) + (1-y_i)log(1-\widehat y_i) \qquad (1) P(y∣x)=−m1i=1∑myi∗log(y
i)+(1−yi)log(1−y
i)(1)
- y i y_i yi 表示 i t h 樣 本 i^{th}樣本 ith樣本标注好的标簽
-
y ^ i \widehat y_i y
i 表示 i t h i^{th} ith樣本輸出預測值
推導
- y i ∈ [ 0 , 1 ] y_i \in [0,1] yi∈[0,1],-1代表負類标簽,1代表正類标簽
-
when y = 1 y =1 y=1, P ( y ∣ x ) = y ^ P(y|x) = \widehat y P(y∣x)=y
。when y = 0 , P ( y ∣ x ) = 1 − y ^ y=0,P(y|x)=1- \widehat y y=0,P(y∣x)=1−y
,
-
是以 可以寫成 :
P ( y ∣ x ) = y ^ y ∗ ( 1 − y ^ ) 1 − y ( 2 ) P(y|x)=\widehat y^{y}*(1-\widehat y)^{1-y}\qquad (2) P(y∣x)=y
y∗(1−y
)1−y(2)
-
由最大似然估計可得 :
P ( y 1 , y 2 , . . . ∣ x 1 , x 2 , . . . ) = ∏ i = 1 m P ( y i ∣ x i ) = ∏ i = 1 m y ^ i y i ∗ ( 1 − y ^ i ) 1 − y i ( 3 ) P(y1,y2,...|x_1,x_2,...) = \prod_{i=1}^m P(y_i|x_i) = \prod_{i=1}^{m}\widehat y_i^{y_i}*(1-\widehat y_i)^{1-y_i}\qquad (3) P(y1,y2,...∣x1,x2,...)=i=1∏mP(yi∣xi)=i=1∏my
iyi∗(1−y
i)1−yi(3)
-
由于乘法對于後面反向傳播求導不友善求解,是以加個log,相乘變為相加,公式(3)變為 :
P ( y ∣ x ) = ∑ i = 1 m y i ∗ l o g ( y ^ i ) + ( 1 − y i ) l o g ( 1 − y ^ i ) ( 4 ) P(y|x) = \sum_{i=1}^{m}y_i*log(\widehat y_i) + (1-y_i)log(1-\widehat y_i) \qquad (4) P(y∣x)=i=1∑myi∗log(y
i)+(1−yi)log(1−y
i)(4)
-
為了讓loss表示小點,前面加個 1 m , 有 取 平 均 效 果 \frac{1}{m},有取平均效果 m1,有取平均效果,由于目标是為了取到loss最小,前面再加個 − 1 -1 −1
,即公式(4)變為: P ( y ∣ x ) = − 1 m ∑ i = 1 m y i ∗ l o g ( y ^ i ) + ( 1 − y i ) l o g ( 1 − y ^ i ) ( 5 ) P(y|x) = - \frac{1}{m} \sum_{i=1}^{m}y_i*log(\widehat y_i) + (1-y_i)log(1-\widehat y_i) \qquad (5) P(y∣x)=−m1i=1∑myi∗log(y
i)+(1−yi)log(1−y
i)(5)
代碼實作
# GRADED FUNCTION: compute_cost
def compute_cost(AL, Y):
"""
Implement the cost function defined by equation (7).
Arguments:
AL -- probability vector corresponding to your label predictions, shape (1, number of examples)
Y -- true "label" vector (for example: containing 0 if non-cat, 1 if cat), shape (1, number of examples)
Returns:
cost -- cross-entropy cost
"""
m = Y.shape[1]
# Compute loss from aL and y.
### START CODE HERE ### (≈ 1 lines of code)
cost = -1.0 / m * np.sum( Y * np.log(AL) + (1 - Y) * np.log(1 - AL))
### END CODE HERE ###
cost = np.squeeze(cost) # To make sure your cost's shape is what we expect (e.g. this turns [[17]] into 17).
assert(cost.shape == ())
return cost