數學概念
logistic回歸聽名字是一個回歸問題,但其實是一個二進制分類問題,losgistic函數如下所示:
f ( x ) = 1 1 + e − x f(x)=\frac{1}{1+e^{-x}} f(x)=1+e−x1
圖像如下:
如圖所示,當x趨向于正無窮時,f(x)的值趨向于1;當x趨向于負無窮時,f(x)的值趨向于0,0.5為分界線。
P ( Y = 1 ∣ x ) = f ( x ) = 1 1 + e − θ 1 + θ 2 x P(Y=1|x)=f(x)=\frac{1}{1+e^{-\theta_1+\theta_2x}} P(Y=1∣x)=f(x)=1+e−θ1+θ2x1
指數和對數函數的複習:
邏輯回歸
解決二進制(0/1)分類的問題
P ( Y = 1 ∣ x ; θ ) = f ( x , θ ) = 1 1 + e − θ T x P(Y=1|x;\theta)=f(x,\theta)=\frac{1}{1+e^{-\theta^Tx}} P(Y=1∣x;θ)=f(x,θ)=1+e−θTx1
θ T x = θ 0 + θ 1 x 1 + θ 2 x 2 + θ 3 x 3 + . . . \theta^Tx=\theta_0+\theta_1x_1+\theta_2x_2+\theta_3x_3+... θTx=θ0+θ1x1+θ2x2+θ3x3+...
θ = [ θ 0 , θ 1 , θ 2 , θ 3 , . . . ] \theta=[\theta_0,\theta_1,\theta_2,\theta_3,...] θ=[θ0,θ1,θ2,θ3,...]
x = [ 1 , x 1 , x 2 , x 3 , . . . ] x=[1,x_1,x_2,x_3,...] x=[1,x1,x2,x3,...]
當P(y=1|x)的值大于0.5時,輸出1;否則輸出0。
示例
from sklearn import linear_model
X=[[20,3],[23,7],[31,10],[42,13],[50,7],[60,5]]
y=[0,1,1,1,0,0]
lr=linear_model.LogisticRegression()
lr.fit(X,y)
testX=[[28,8]]
label=lr.predict(testX)
print("predicted label = ",label)
#predicted label= [1]
prob=lr.predict_proba(testX)
print("probability = ",prob)
#probability = [[0.14694811 0.85305189]]
#兩個值分别是為0和為1的機率
theta_0=lr.intercept_
theta_1=lr.coef_[0][0]
theta_2=lr.coef_[0][1]
print("theta_0:",theta_0) #theta_0: [-0.04131838]
print("theta_1:",theta_1) #theta_1: -0.19730001368291533
print("theta_2:",theta_2) #theta_2: 0.9155574523479832
損失函數
P ( Y = 1 ∣ x ; θ ) = f ( x , θ ) = 1 1 + e − θ T x P(Y=1|x;\theta)=f(x,\theta)=\frac{1}{1+e^{-\theta^Tx}} P(Y=1∣x;θ)=f(x,θ)=1+e−θTx1
J ( θ ) = − ∑ i = 1 N y i l n ( P ( Y = 1 ∣ X = x i ; θ ) ) + ( 1 − y i ) l n ( 1 − P ( Y = 1 ∣ X = x i ; θ ) ) J(\theta)=-\sum_{i=1}^Ny^iln(P(Y=1|X=x^i;\theta))+(1-y^i)ln(1-P(Y=1|X=x^i;\theta)) J(θ)=−∑i=1Nyiln(P(Y=1∣X=xi;θ))+(1−yi)ln(1−P(Y=1∣X=xi;θ))
損失函數的算法推導
導數公式複習:
f ( z ) = 1 1 + e − z f(z)=\frac{1}{1+e^{-z}} f(z)=1+e−z1
f ′ ( z ) = f ( z ) ( 1 − f ( z ) ) f'(z)=f(z)(1-f(z)) f′(z)=f(z)(1−f(z))
z = θ x z=\theta x z=θx
d f ( z ) d x = d f ( z ) d z d ( z ) d x = f ( z ) ∗ ( 1 − f ( z ) ) ∗ θ \frac{df(z)}{dx}=\frac{df(z)}{dz}\frac{d(z)}{dx}=f(z)*(1-f(z))*\theta dxdf(z)=dzdf(z)dxd(z)=f(z)∗(1−f(z))∗θ
資料集是 ( x i , y i ) , i = ( 1 , 2 , 3... , N ) , x i 是 m 維 的 , y i 是 0 或 1 ; (x_i,y_i),i=(1,2,3...,N),x_i是m維的,y^i是0或1; (xi,yi),i=(1,2,3...,N),xi是m維的,yi是0或1;
P i = P ( y i = 1 ∣ θ ; x i ) = f ( θ , x i ) = 1 1 + e − θ x i P_i=P(y_i=1|\theta;x_i)=f(\theta,x_i)=\frac{1}{1+e^{-\theta x_i}} Pi=P(yi=1∣θ;xi)=f(θ,xi)=1+e−θxi1
是以我們知道 θ \theta θ的最優值是可以使得 P i ≈ y i P_i≈y_i Pi≈yi;
綜上所述:
目标函數: m a x L ( θ ) = ∏ i = 1 , y i = 1 N P i ∗ ∏ i = 1 , y i = 0 N ( 1 − P i ) maxL(\theta)=\prod_{i=1,y_i=1}^NP_i*\prod_{i=1,y_i=0}^N(1-P_i) maxL(θ)=∏i=1,yi=1NPi∗∏i=1,yi=0N(1−Pi)
當y=1時,使得P盡可能的大,當y=0時,使得P盡可能的小;
為了更友善的進行求導,我們加上一個對數函數,如下:
l ( θ ) = l o g L ( θ ) = ∑ i = 1 , y i = 1 N l o g P i + ∑ i = 1 , y i = 0 N l o g ( 1 − P i ) l(\theta)=logL(\theta)=\sum_{i=1,y_i=1}^NlogP_i+\sum_{i=1,y_i=0}^Nlog(1-P_i) l(θ)=logL(θ)=i=1,yi=1∑NlogPi+i=1,yi=0∑Nlog(1−Pi)
可以改成:
l ( θ ) = l o g L ( θ ) = ∑ i = 1 N y i l o g P i + ∑ i = 1 N ( 1 − y i ) l o g ( 1 − P i ) l(\theta)=logL(\theta)=\sum_{i=1}^Ny_ilogP_i+\sum_{i=1}^N(1-y_i)log(1-P_i) l(θ)=logL(θ)=i=1∑NyilogPi+i=1∑N(1−yi)log(1−Pi)
d l ( θ ) d θ = ∑ i = 1 N 1 P i P i ( 1 − P i ) x i + ( 1 − y i ) 1 1 − P i [ 0 − P i ( 1 − P i ) ] = ∑ i = 1 N ( y i x i − P i x i ) = ∑ i = 1 N ( y i − P i ) x i \frac{dl(\theta)}{d\theta}=\sum_{i=1}^N\frac{1}{P_i}P_i(1-P_i)x_i+(1-y_i)\frac{1}{1-P_i}[0-P_i(1-P_i)] =\sum_{i=1}^N(y_ix_i-P_ix_i)=\sum_{i=1}^N(y_i-P_i)x_i dθdl(θ)=i=1∑NPi1Pi(1−Pi)xi+(1−yi)1−Pi1[0−Pi(1−Pi)]=i=1∑N(yixi−Pixi)=i=1∑N(yi−Pi)xi
l o s s ( θ ) = − l ( θ ) loss(\theta)=-l(\theta) loss(θ)=−l(θ)
d l o s s ( θ ) d θ = ∑ i = 1 N ( P i − y i ) x i \frac{dloss(\theta)}{d\theta}=\sum_{i=1}^N(P_i-y_i)x_i dθdloss(θ)=i=1∑N(Pi−yi)xi
梯度下降
textX=[[28,8]]
ratio=prob[0][1]/prob[0][0]
testX=[[28,9]]
prob_new=lr.predict_proba(testX)
ratio_new=prob_new[0][1]/prob_new[0][0]
ratio_of_ratio=ratio_new/ratio
print("ratio_of_ratio:",ratio_of_ratio)
#ratio_of_ratio: 2.4981674731438916
import math
theta_2_e=math.exp(theta_2)
print("theta_2_e:",theta_2_e)
#theta_2_e: 2.498167473143895