天天看點

邏輯回歸的參數θ推導

邏輯回歸的參數θ推導

    • 1 sigmoid函數
    • 2 線性回歸函數
    • 3 線性回歸-->邏輯回歸
    • 4 訓練 θ,求最優解
    • 5 理論分析
    • 6 推導備注

1 sigmoid函數

y = 1 1 + e − x y=\frac{1}{1+e^{-x}} y=1+e−x1​被稱作sigmoid函數,sigmoid函數可以看成樣本資料的機率密度函數。

2 線性回歸函數

線性回歸的公式如下:

z = θ 0 + θ 1 x 1 + . . . + θ n x n = θ T x 。 z=θ_{0}+θ_{1}x_{1}+...+θ_{n}x_{n}=θ^{T}x。 z=θ0​+θ1​x1​+...+θn​xn​=θTx。

3 線性回歸–>邏輯回歸

邏輯回歸基于線性回歸,通過sigmoid函數将“線性回歸函數的結果”映射到

(0,1)

上,屬于廣義線性回歸模型:

h θ ( x ) = 1 1 + e − z = 1 1 + e − θ T x h_{θ}(x)=\frac{1}{1+e^{-z}}=\frac{1}{1+e^{-θ^{T}x}} hθ​(x)=1+e−z1​=1+e−θTx1​

h θ ( x ) h_{θ}(x) hθ​(x)可以表示取1的機率,是以對于輸入 x x x 分類結果為類别

1

和類别

的機率分别為:

p ( y = 1 ∣ x ; θ ) = h θ ( x ) p(y=1|x;θ)=h_{θ}(x) p(y=1∣x;θ)=hθ​(x) p ( y = 0 ∣ x ; θ ) = 1 − h θ ( x ) p(y=0|x;θ)=1-h_{θ}(x) p(y=0∣x;θ)=1−hθ​(x)

4 訓練 θ,求最優解

求最大似然估計量 θ ^ \widehat{θ} θ

的一般步驟:

(1) 寫出似然函數;

(2) 對似然函數取對數,并整理;

(3) 求導數;

(4) 解似然方程。

5 理論分析

P ( y ∣ x ; θ ) = ( h θ ( x ) ) y ∗ ( 1 − h θ ( x ) ) 1 − y P(y|x;θ)=(h_{θ}(x))^{y}*(1-h_{θ}(x))^{1-y} P(y∣x;θ)=(hθ​(x))y∗(1−hθ​(x))1−y樣本資料( m m m個)獨立,是以它們的聯合分布可以表示為各邊際分布的乘積,取似然函數:

L ( θ ) = π i = 1 m P ( y i ∣ x i ; θ ) L(θ)=\pi^{m}_{i=1}P(y_{i}|x_{i};θ) L(θ)=πi=1m​P(yi​∣xi​;θ)

= π i = 1 m ( h θ ( x i ) ) y i × ( 1 − h θ ( x i ) ) 1 − y i =\pi^{m}_{i=1}(h_{θ}(x_{i}))^{y_{i}}\times(1-h_{θ}(x_{i}))^{1-y_{i}} =πi=1m​(hθ​(xi​))yi​×(1−hθ​(xi​))1−yi​

取對數似然函數:

l ( θ ) = l o g ( L ( θ ) ) = Σ i = 1 m l o g ( ( h θ ( x i ) ) y i ) + l o g ( ( 1 − h θ ( x i ) ) 1 − y i ) l(\theta)=log(L(\theta))=\Sigma^{m}_{i=1}log((h_{\theta}(x_{i}))^{y_{i}})+log((1-h_{\theta}(x_{i}))^{1-y_{i}}) l(θ)=log(L(θ))=Σi=1m​log((hθ​(xi​))yi​)+log((1−hθ​(xi​))1−yi​)

= Σ i = 1 m y i l o g ( h θ ( x i ) ) + ( 1 − y i ) l o g ( 1 − h θ ( x i ) ) =\Sigma^{m}_{i=1}y_{i}log(h_{\theta}(x_{i}))+(1-y_{i})log(1-h_{\theta}(x_{i})) =Σi=1m​yi​log(hθ​(xi​))+(1−yi​)log(1−hθ​(xi​))

= Σ i = 1 m y i l o g ( P ( y i = 1 ∣ x i ) + ( 1 − y i ) l o g ( 1 − P ( y i = 1 ∣ x i ) ) =\Sigma_{i=1}^{m}y_{i}log(P(y_{i}=1|x_{i})+(1-y_{i})log(1-P(y_{i}=1|x_{i})) =Σi=1m​yi​log(P(yi​=1∣xi​)+(1−yi​)log(1−P(yi​=1∣xi​))

= Σ i = 1 m y i l o g ( P ( y i = 1 ∣ x i ) + l o g ( 1 − P ( y i = 1 ∣ x i ) ) − y i l o g ( 1 − P ( y i = 1 ∣ x i ) ) =\Sigma_{i=1}^{m}y_{i}log(P(y_{i}=1|x_{i})+log(1-P(y_{i}=1|x_{i}))-y_{i}log(1-P(y_{i}=1|x_{i})) =Σi=1m​yi​log(P(yi​=1∣xi​)+log(1−P(yi​=1∣xi​))−yi​log(1−P(yi​=1∣xi​))

= Σ i = 1 m y i l o g P ( y i = 1 ∣ x i ) 1 − P ( y i = 1 ∣ x i ) + Σ i = 1 m l o g ( 1 − P ( y i = 1 ∣ x i ) ) =\Sigma_{i=1}^{m}y_{i}log\frac{P(y_{i}=1|x_{i})}{1-P(y_{i}=1|x_{i})}+\Sigma_{i=1}^{m}log(1-P(y_{i}=1|x_{i})) =Σi=1m​yi​log1−P(yi​=1∣xi​)P(yi​=1∣xi​)​+Σi=1m​log(1−P(yi​=1∣xi​))

= Σ i = 1 m y i θ T x i + Σ i = 1 m l o g ( 1 − P ( y i = 1 ∣ x i ) ) =\Sigma_{i=1}^{m}y_{i}\theta^{T}x_{i}+\Sigma_{i=1}^{m}log(1-P(y_{i}=1|x_{i})) =Σi=1m​yi​θTxi​+Σi=1m​log(1−P(yi​=1∣xi​))

= Σ i = 1 m y i θ T x i + Σ i = 1 m l o g e − θ T x i 1 + e − θ T x i =\Sigma_{i=1}^{m}y_{i}\theta^{T}x_{i}+\Sigma_{i=1}^{m}log\frac{e^{-\theta^{T}x_{i}}}{1+e^{-\theta^{T}x_{i}}} =Σi=1m​yi​θTxi​+Σi=1m​log1+e−θTxi​e−θTxi​​

= Σ i = 1 m y i θ T x i − Σ i = 1 m l o g ( 1 + e θ T x i ) =\Sigma_{i=1}^{m}y_{i}\theta^{T}x_{i}-\Sigma_{i=1}^{m}log(1+e^{\theta^{T}x_{i}}) =Σi=1m​yi​θTxi​−Σi=1m​log(1+eθTxi​)

∂ L ( θ ) ∂ θ = Σ i = 1 m y i x i − Σ i = 1 m e θ T x i 1 + e θ T x i x i \frac{\partial L(\theta)}{\partial\theta}=\Sigma_{i=1}^{m}y_{i}x_{i}-\Sigma_{i=1}^{m}\frac{e^{\theta^{T}x_{i}}}{1+e^{\theta^{T}x_{i}}}x_{i} ∂θ∂L(θ)​=Σi=1m​yi​xi​−Σi=1m​1+eθTxi​eθTxi​​xi​

= Σ i = 1 m ( y i − σ ( θ T x i ) ) x i =\Sigma_{i=1}^{m}(y_{i}-\sigma(\theta^{T}x_{i}))x_{i} =Σi=1m​(yi​−σ(θTxi​))xi​

θ t + 1 = θ t − α ∂ L ( θ ) ∂ θ = θ t − α Σ i = 1 m ( y i − σ ( θ T x i ) ) x i \theta^{t+1}=\theta^{t}-\alpha\frac{\partial L(\theta)}{\partial\theta}=\theta^{t}-\alpha\Sigma_{i=1}^{m}(y_{i}-\sigma(\theta^{T}x_{i}))x_{i} θt+1=θt−α∂θ∂L(θ)​=θt−αΣi=1m​(yi​−σ(θTxi​))xi​

6 推導備注

P ( y = 1 ∣ x ) = 1 1 + e − θ T x P(y=1|x)=\frac{1}{1+e^{-\theta^{T}x}} P(y=1∣x)=1+e−θTx1​

P ( y = 0 ∣ x ) = 1 − 1 1 + e − θ T x P(y=0|x)=1-\frac{1}{1+e^{-\theta^{T}x}} P(y=0∣x)=1−1+e−θTx1​

P ( y = 1 ∣ x ) P ( y = 0 ∣ x ) = e θ T x \frac{P(y=1|x)}{P(y=0|x)}=e^{\theta^{T}x} P(y=0∣x)P(y=1∣x)​=eθTx

σ ( θ T x i ) ) = 1 1 + e − θ T x i \sigma(\theta^{T}x_{i}))=\frac{1}{1+e^{-\theta^{T}x_{i}}} σ(θTxi​))=1+e−θTxi​1​

繼續閱讀