天天看点

逻辑回归公式推导(Logistic Regression)基本定义数学推导代码实现

基本定义

逻辑回归通过拟合一条直线将不同类别的样本区分开来。对于二分类问题而言,给定一个训练样本集合(x,y) m ^m m,其中x ∈ \in ∈ R n R^n Rn,y ∈ \in ∈{0,1},目标是学习一条直线将两个类别的样本区分开来。逻辑回归通过学习一个假设函数h θ ( x ) _\theta(\mathbf x) θ​(x)=g( θ T x \theta^T\mathbf x θTx)来预测样本属于类别1的概率,其中函数g为sigmoid函数(简称s函数),计算如公式1所示:

g ( z ) = 1 1 + e − z (1) \begin{aligned} g(z) & = \frac 1{1+e^{-z}} \tag{1} \end{aligned} g(z)​=1+e−z1​​(1)

s函数具有很好的数学性质,其一阶导数可以由自己表示,计算如公式2所示。将s函数带入可以得到逻辑回归的分类函数,计算如公式3所示,对于一个样本 x \mathbf x x,逻辑回归分类该样本属于类别 y ^ \hat y y^​=1和 y ^ \hat y y^​=0的概率分别如公式4和公式5所示:

g ′ ( z ) = g ( z ) ( 1 − g ( z ) ) (2) \begin{aligned} g'(z) & = g(z) (1-g(z) ) \tag{2} \end{aligned} g′(z)​=g(z)(1−g(z))​(2)

h θ ( x ) = g ( θ T x ) = 1 1 + e − θ T x (3) \begin{aligned} h_\theta(\mathbf x) = g(\theta^T\mathbf x) & = \frac 1{1+e^{-\theta^T\mathbf x}} \tag{3} \end{aligned} hθ​(x)=g(θTx)​=1+e−θTx1​​(3)

y ^ = P ( y = 1 ∣ x ; θ ) = h θ ( x ) = g ( θ T x ) (4) \begin{aligned} \hat y =P(y=1|\mathbf x;\theta) = h_\theta(\mathbf x) = g(\theta^T\mathbf x) \tag{4} \end{aligned} y^​=P(y=1∣x;θ)=hθ​(x)=g(θTx)​(4)

y ^ = P ( y = 0 ∣ x ; θ ) = 1 − h θ ( x ) = 1 − g ( θ T x ) (5) \begin{aligned} \hat y =P(y=0|\mathbf x;\theta) = 1- h_\theta(\mathbf x) = 1- g(\theta^T\mathbf x) \tag{5} \end{aligned} y^​=P(y=0∣x;θ)=1−hθ​(x)=1−g(θTx)​(5)

数学推导

对于给定的训练样本集合中的m个样本,其释然函数可以表示为:

L ( θ ) = ∏ i = 1 m p ( y ∣ x ; θ ) = ∏ i = 1 m h θ ( x ) y i ( 1 − h θ ( x ) ) 1 − y i (6) \begin{aligned} L(\theta) = \prod_{i=1}^m p(y| \mathbf x;\theta) = \prod_{i=1}^m h_\theta(\mathbf x) ^ {y_i} (1 - h_\theta(\mathbf x))^{1-y_i} \tag{6} \end{aligned} L(θ)=i=1∏m​p(y∣x;θ)=i=1∏m​hθ​(x)yi​(1−hθ​(x))1−yi​​(6)

对数释然函数计算公式如下:

l ( θ ) = l o g L ( θ ) = ∑ i = 1 m y i l o g h θ ( x ) + ( 1 − y i ) l o g ( 1 − h θ ( x ) ) (7) \begin{aligned} l(\theta) =logL(\theta) = \sum_{i=1}^m y_i logh_\theta(\mathbf x) + (1-y_i) log(1 - h_\theta(\mathbf x)) \tag{7} \end{aligned} l(θ)=logL(θ)=i=1∑m​yi​loghθ​(x)+(1−yi​)log(1−hθ​(x))​(7)

为了使对数释然函数最大,可以定义逻辑回归的损失函数为:

J ( θ ) = − 1 m l ( θ ) (8) \begin{aligned} J(\theta) = -\frac 1m l(\theta) \tag{8} \end{aligned} J(θ)=−m1​l(θ)​(8)

为了求得最优的参数 θ \theta θ,可以应用随机梯度下降算法对参数求偏导数,具体的推导公式如下:

∂ J ( θ ) ∂ θ j = − 1 m ∑ i = 1 m ( y ( i ) 1 h θ ( x ) ∂ h θ ( x ) ∂ θ j − ( 1 − y ( i ) ) 1 1 − h θ ( x ) ∂ h θ ( x ) ∂ θ j ) = − 1 m ∑ i = 1 m ( y ( i ) 1 h θ ( x ) − ( 1 − y ( i ) ) 1 1 − h θ ( x ) ) ∂ h θ ( x ) ∂ θ j = − 1 m ∑ i = 1 m ( y ( i ) 1 g ( θ T x ( i ) ) − ( 1 − y ( i ) ) 1 1 − g ( θ T x ( i ) ) ) ∂ g ( θ T x ( i ) ) ∂ θ j = − 1 m ∑ i = 1 m ( y ( i ) 1 g ( θ T x ( i ) ) − ( 1 − y ( i ) ) 1 1 − g ( θ T x ( i ) ) ) g ( θ T x ( i ) ) ( 1 − g ( θ T x ( i ) ) ) x j ( i ) = − 1 m ∑ i = 1 m ( y ( i ) − g ( θ T x ( i ) ) ) x j ( i ) = 1 m ∑ i = 1 m ( h θ ( x ( i ) ) − y ( i ) ) x j ( i ) \begin{aligned} \frac{\partial J(\theta)}{\partial\theta_j} & = -\frac 1m \sum_{i=1}^m \Biggl(y ^{(i)} \frac 1{h_\theta(\mathbf x)}\frac{\partial h_\theta(\mathbf x)}{\partial\theta_j}-(1-y ^{(i)}) \frac 1{1-h_\theta(\mathbf x)}\frac{\partial h_\theta(\mathbf x)}{\partial\theta_j}\Biggr) \\ & = -\frac 1m \sum_{i=1}^m \Biggl(y ^{(i)} \frac 1{h_\theta(\mathbf x)}-(1-y ^{(i)}) \frac 1{1-h_\theta(\mathbf x)}\Biggr) \frac{\partial h_\theta(\mathbf x)}{\partial\theta_j}\\ & = -\frac 1m \sum_{i=1}^m \Biggl(y ^{(i)} \frac 1{g(\theta^T\mathbf x^{(i)})}-(1-y ^{(i)}) \frac 1{1-g(\theta^T\mathbf x^{(i)})}\Biggr) \frac{\partial g(\theta^T\mathbf x^{(i)})}{\partial\theta_j}\\ & = -\frac 1m \sum_{i=1}^m \Biggl(y ^{(i)} \frac 1{g(\theta^T\mathbf x^{(i)})}-(1-y ^{(i)}) \frac 1{1-g(\theta^T\mathbf x^{(i)})}\Biggr) g(\theta^T\mathbf x^{(i)})(1-g(\theta^T\mathbf x^{(i)})) x_j^{(i)}\\ & = -\frac 1m \sum_{i=1}^m \Biggl(y ^{(i)} -g(\theta^T\mathbf x^{(i)})\Biggr) x_j^{(i)}\\ & = \frac 1m \sum_{i=1}^m (h_\theta(\mathbf x ^{(i)})-y ^{(i)}) x_j^{(i)} \end{aligned} ∂θj​∂J(θ)​​=−m1​i=1∑m​(y(i)hθ​(x)1​∂θj​∂hθ​(x)​−(1−y(i))1−hθ​(x)1​∂θj​∂hθ​(x)​)=−m1​i=1∑m​(y(i)hθ​(x)1​−(1−y(i))1−hθ​(x)1​)∂θj​∂hθ​(x)​=−m1​i=1∑m​(y(i)g(θTx(i))1​−(1−y(i))1−g(θTx(i))1​)∂θj​∂g(θTx(i))​=−m1​i=1∑m​(y(i)g(θTx(i))1​−(1−y(i))1−g(θTx(i))1​)g(θTx(i))(1−g(θTx(i)))xj(i)​=−m1​i=1∑m​(y(i)−g(θTx(i)))xj(i)​=m1​i=1∑m​(hθ​(x(i))−y(i))xj(i)​​

将求得的偏导带入梯度下降公式,可以得到参数 θ \theta θ的更新公式如下:

θ j : = θ j − α ∂ J ( θ ) ∂ θ j = θ j − α 1 m ∑ i = 1 m ( h θ ( x ( i ) ) − y ( i ) ) x j ( i ) (9) \begin{aligned} \theta_j := \theta_j - \alpha \frac{\partial J(\theta)}{\partial\theta_j} = \theta_j - \alpha \frac 1m \sum_{i=1}^m (h_\theta(\mathbf x ^{(i)})-y ^{(i)}) x_j^{(i)} \tag{9} \end{aligned} θj​:=θj​−α∂θj​∂J(θ)​=θj​−αm1​i=1∑m​(hθ​(x(i))−y(i))xj(i)​​(9)

代码实现

相关代码实现后面统一发布到github上面。

继续阅读