邏輯回歸的參數θ推導
-
- 1 sigmoid函數
- 2 線性回歸函數
- 3 線性回歸-->邏輯回歸
- 4 訓練 θ,求最優解
- 5 理論分析
- 6 推導備注
1 sigmoid函數
y = 1 1 + e − x y=\frac{1}{1+e^{-x}} y=1+e−x1被稱作sigmoid函數,sigmoid函數可以看成樣本資料的機率密度函數。
2 線性回歸函數
線性回歸的公式如下:
z = θ 0 + θ 1 x 1 + . . . + θ n x n = θ T x 。 z=θ_{0}+θ_{1}x_{1}+...+θ_{n}x_{n}=θ^{T}x。 z=θ0+θ1x1+...+θnxn=θTx。
3 線性回歸–>邏輯回歸
邏輯回歸基于線性回歸,通過sigmoid函數将“線性回歸函數的結果”映射到
(0,1)
上,屬于廣義線性回歸模型:
h θ ( x ) = 1 1 + e − z = 1 1 + e − θ T x h_{θ}(x)=\frac{1}{1+e^{-z}}=\frac{1}{1+e^{-θ^{T}x}} hθ(x)=1+e−z1=1+e−θTx1
h θ ( x ) h_{θ}(x) hθ(x)可以表示取1的機率,是以對于輸入 x x x 分類結果為類别
1
和類别
的機率分别為:
p ( y = 1 ∣ x ; θ ) = h θ ( x ) p(y=1|x;θ)=h_{θ}(x) p(y=1∣x;θ)=hθ(x) p ( y = 0 ∣ x ; θ ) = 1 − h θ ( x ) p(y=0|x;θ)=1-h_{θ}(x) p(y=0∣x;θ)=1−hθ(x)
4 訓練 θ,求最優解
求最大似然估計量 θ ^ \widehat{θ} θ
的一般步驟:
(1) 寫出似然函數;
(2) 對似然函數取對數,并整理;
(3) 求導數;
(4) 解似然方程。
5 理論分析
P ( y ∣ x ; θ ) = ( h θ ( x ) ) y ∗ ( 1 − h θ ( x ) ) 1 − y P(y|x;θ)=(h_{θ}(x))^{y}*(1-h_{θ}(x))^{1-y} P(y∣x;θ)=(hθ(x))y∗(1−hθ(x))1−y樣本資料( m m m個)獨立,是以它們的聯合分布可以表示為各邊際分布的乘積,取似然函數:
L ( θ ) = π i = 1 m P ( y i ∣ x i ; θ ) L(θ)=\pi^{m}_{i=1}P(y_{i}|x_{i};θ) L(θ)=πi=1mP(yi∣xi;θ)
= π i = 1 m ( h θ ( x i ) ) y i × ( 1 − h θ ( x i ) ) 1 − y i =\pi^{m}_{i=1}(h_{θ}(x_{i}))^{y_{i}}\times(1-h_{θ}(x_{i}))^{1-y_{i}} =πi=1m(hθ(xi))yi×(1−hθ(xi))1−yi
取對數似然函數:
l ( θ ) = l o g ( L ( θ ) ) = Σ i = 1 m l o g ( ( h θ ( x i ) ) y i ) + l o g ( ( 1 − h θ ( x i ) ) 1 − y i ) l(\theta)=log(L(\theta))=\Sigma^{m}_{i=1}log((h_{\theta}(x_{i}))^{y_{i}})+log((1-h_{\theta}(x_{i}))^{1-y_{i}}) l(θ)=log(L(θ))=Σi=1mlog((hθ(xi))yi)+log((1−hθ(xi))1−yi)
= Σ i = 1 m y i l o g ( h θ ( x i ) ) + ( 1 − y i ) l o g ( 1 − h θ ( x i ) ) =\Sigma^{m}_{i=1}y_{i}log(h_{\theta}(x_{i}))+(1-y_{i})log(1-h_{\theta}(x_{i})) =Σi=1myilog(hθ(xi))+(1−yi)log(1−hθ(xi))
= Σ i = 1 m y i l o g ( P ( y i = 1 ∣ x i ) + ( 1 − y i ) l o g ( 1 − P ( y i = 1 ∣ x i ) ) =\Sigma_{i=1}^{m}y_{i}log(P(y_{i}=1|x_{i})+(1-y_{i})log(1-P(y_{i}=1|x_{i})) =Σi=1myilog(P(yi=1∣xi)+(1−yi)log(1−P(yi=1∣xi))
= Σ i = 1 m y i l o g ( P ( y i = 1 ∣ x i ) + l o g ( 1 − P ( y i = 1 ∣ x i ) ) − y i l o g ( 1 − P ( y i = 1 ∣ x i ) ) =\Sigma_{i=1}^{m}y_{i}log(P(y_{i}=1|x_{i})+log(1-P(y_{i}=1|x_{i}))-y_{i}log(1-P(y_{i}=1|x_{i})) =Σi=1myilog(P(yi=1∣xi)+log(1−P(yi=1∣xi))−yilog(1−P(yi=1∣xi))
= Σ i = 1 m y i l o g P ( y i = 1 ∣ x i ) 1 − P ( y i = 1 ∣ x i ) + Σ i = 1 m l o g ( 1 − P ( y i = 1 ∣ x i ) ) =\Sigma_{i=1}^{m}y_{i}log\frac{P(y_{i}=1|x_{i})}{1-P(y_{i}=1|x_{i})}+\Sigma_{i=1}^{m}log(1-P(y_{i}=1|x_{i})) =Σi=1myilog1−P(yi=1∣xi)P(yi=1∣xi)+Σi=1mlog(1−P(yi=1∣xi))
= Σ i = 1 m y i θ T x i + Σ i = 1 m l o g ( 1 − P ( y i = 1 ∣ x i ) ) =\Sigma_{i=1}^{m}y_{i}\theta^{T}x_{i}+\Sigma_{i=1}^{m}log(1-P(y_{i}=1|x_{i})) =Σi=1myiθTxi+Σi=1mlog(1−P(yi=1∣xi))
= Σ i = 1 m y i θ T x i + Σ i = 1 m l o g e − θ T x i 1 + e − θ T x i =\Sigma_{i=1}^{m}y_{i}\theta^{T}x_{i}+\Sigma_{i=1}^{m}log\frac{e^{-\theta^{T}x_{i}}}{1+e^{-\theta^{T}x_{i}}} =Σi=1myiθTxi+Σi=1mlog1+e−θTxie−θTxi
= Σ i = 1 m y i θ T x i − Σ i = 1 m l o g ( 1 + e θ T x i ) =\Sigma_{i=1}^{m}y_{i}\theta^{T}x_{i}-\Sigma_{i=1}^{m}log(1+e^{\theta^{T}x_{i}}) =Σi=1myiθTxi−Σi=1mlog(1+eθTxi)
∂ L ( θ ) ∂ θ = Σ i = 1 m y i x i − Σ i = 1 m e θ T x i 1 + e θ T x i x i \frac{\partial L(\theta)}{\partial\theta}=\Sigma_{i=1}^{m}y_{i}x_{i}-\Sigma_{i=1}^{m}\frac{e^{\theta^{T}x_{i}}}{1+e^{\theta^{T}x_{i}}}x_{i} ∂θ∂L(θ)=Σi=1myixi−Σi=1m1+eθTxieθTxixi
= Σ i = 1 m ( y i − σ ( θ T x i ) ) x i =\Sigma_{i=1}^{m}(y_{i}-\sigma(\theta^{T}x_{i}))x_{i} =Σi=1m(yi−σ(θTxi))xi
θ t + 1 = θ t − α ∂ L ( θ ) ∂ θ = θ t − α Σ i = 1 m ( y i − σ ( θ T x i ) ) x i \theta^{t+1}=\theta^{t}-\alpha\frac{\partial L(\theta)}{\partial\theta}=\theta^{t}-\alpha\Sigma_{i=1}^{m}(y_{i}-\sigma(\theta^{T}x_{i}))x_{i} θt+1=θt−α∂θ∂L(θ)=θt−αΣi=1m(yi−σ(θTxi))xi
6 推導備注
P ( y = 1 ∣ x ) = 1 1 + e − θ T x P(y=1|x)=\frac{1}{1+e^{-\theta^{T}x}} P(y=1∣x)=1+e−θTx1
P ( y = 0 ∣ x ) = 1 − 1 1 + e − θ T x P(y=0|x)=1-\frac{1}{1+e^{-\theta^{T}x}} P(y=0∣x)=1−1+e−θTx1
P ( y = 1 ∣ x ) P ( y = 0 ∣ x ) = e θ T x \frac{P(y=1|x)}{P(y=0|x)}=e^{\theta^{T}x} P(y=0∣x)P(y=1∣x)=eθTx
σ ( θ T x i ) ) = 1 1 + e − θ T x i \sigma(\theta^{T}x_{i}))=\frac{1}{1+e^{-\theta^{T}x_{i}}} σ(θTxi))=1+e−θTxi1