逻辑回归--sklearn基本使用

逻辑回归–sklearn基本使用

penalty : str, ‘l1’ or ‘l2’, default: ‘l2’

惩罚项l1或者l2 l1可以使weight稀疏，l2可以使weight均衡，当solvers 为newton-cg’, ‘sag’ and ‘lbfgs’时，只可以是l2

C : float, default: 1.0

正则化的强度

fit_intercept : bool, default: True

默认为true，此参数为截距，即y=ax+b的b

intercept_scaling : float, default 1.

Useful only when the solver ‘liblinear’ is used and self.fit_intercept is set to True. In this case, x becomes [x, self.intercept_scaling], i.e. a “synthetic” feature with constant value equal to intercept_scaling is appended to the instance vector. The intercept becomes intercept_scaling * synthetic_feature_weight.

Note! the synthetic feature weight is subject to l1/l2 regularization as all other features. To lessen the effect of regularization on synthetic feature weight (and therefore on the intercept) intercept_scaling has to be increased.

class_weight : dict or ‘balanced’, default: None

默认是balanced，即{0：1，1：1}，如果label中0比较重要，我就可以{0：2，1：1}，即代价敏感学习，一般在样本不平衡中使用

solver : {‘newton-cg’, ‘lbfgs’, ‘liblinear’, ‘sag’}, default: ‘liblinear’

优化的算法

数据比较少时，用liblinear是一个比较好的选择。在数据比较多的情况下，sag更快一些

对于多分类问题，only ‘newton-cg’, and ‘lbfgs’ 可以处理multinomial loss， ‘liblinear’只可以解决ovr

‘newton-cg’, ‘lbfgs’ and ‘sag’ 只可以解决l2范式

multi_class : str, {‘ovr’, ‘multinomial’}, default: ‘ovr’

If the option chosen is ‘ovr’, then a binary problem is fit for each label. Else the loss minimised is the multinomial loss fit across the entire probability distribution. Works only for the ‘newton-cg’, ‘sag’ and ‘lbfgs’ solver.

n_jobs : int, default: 1

默认是1，此参数为线程数，可以根据个人电脑增加

Attributes

coef_ : array, shape (n_classes, n_features)

每一维特征的系数，即weight

intercept_ : array, shape (n_classes,)

截距，即bias

# -*- coding: utf-8 -*-
"""
Created on Wed Apr 12 21:28:40 2017

@author: 大帆
"""

from sklearn.decomposition import PCA
from sklearn.datasets import load_iris
from sklearn.linear_model import LogisticRegression
import matplotlib.pyplot as plt
import numpy as np
plt.rcParams['font.sans-serif'] = ['SimHei']   
plt.rcParams['font.family']='sans-serif' 
plt.rcParams['axes.unicode_minus'] = False

iris=load_iris()
iris_data=iris.data
iris_target=iris.target

print(iris_data.shape)

pca=PCA(n_components=2)

X=pca.fit_transform(iris_data)
print(X.shape)

f=plt.figure()
ax=f.add_subplot(111)
ax.plot(X[:,0][iris_target==0],X[:,1][iris_target==0],'bo')
ax.scatter(X[:,0][iris_target==1],X[:,1][iris_target==1],c='r')
ax.scatter(X[:,0][iris_target==2],X[:,1][iris_target==2],c='y')
ax.set_title('数据分布图')
plt.show()

clf=LogisticRegression(multi_class='ovr',solver='lbfgs',class_weight={0:1,1:1,2:1})
clf.fit(X,iris_target)
score=clf.score(X,iris_target)

x0min,x0max=X[:,0].min(),X[:,0].max()
x1min,x1max=X[:,1].min(),X[:,1].max()
h=0.05
xx,yy=np.meshgrid(np.arange(x0min-1,x0max+1,h),np.arange(x1min-1,x1max+1,h))
x_=xx.reshape([xx.shape[0]*xx.shape[1],1])
y_=yy.reshape([yy.shape[0]*yy.shape[1],1])
test_x=np.c_[x_,y_]

test_predict=clf.predict(test_x)
z=test_predict.reshape(xx.shape)
plt.contourf(xx,yy,z, cmap=plt.cm.Paired)
plt.axis('tight')
colors='bry'
for i,color in zip(clf.classes_,colors):
    idx=np.where(iris_target==i)
    plt.scatter(X[idx,0],X[idx,1],c=color,cmap=plt.cm.Paired)

xmin,xmax=plt.xlim()
coef=clf.coef_
intercept=clf.intercept_
def line(c,x0):
    return (-coef[c,0]*x0-intercept[c])/coef[c,1]
for i,color in zip(clf.classes_,colors):
    plt.plot([xmin,xmax],[line(i,xmin),line(i,xmax)],color=color,linestyle='--')
plt.title("score:{0}".format(score))

逻辑回归--sklearn基本使用

请看下一篇用tensorflow模拟逻辑回归http://blog.csdn.net/daxiaofan/article/details/70156357

参考：

sklearn–逻辑回归

(http://scikit-learn.org/stable/modules/generated/sklearn.linear_model.LogisticRegression.html#sklearn.linear_model.LogisticRegression)

逻辑回归--sklearn基本使用

逻辑回归–sklearn基本使用

Attributes

继续阅读

简单文档分类——朴素贝叶斯算法朴素贝叶斯算法简单文档分类实例步骤总结朴素贝叶斯分类调用(sklearn)

【分类算法】什么是分类算法定义分类与聚类分类过程方法

分类算法的评价指标

K-近邻算法以及图像分类应用

weka之NB算法

使用weka的select attribute

weka中分类器算法

在weka中集成自己的算法

【多变量线性回归】学习记录序思路实现终

申请评分模型拒绝推断（RI）方法申请评分模型拒绝推断（RI）方法

【人工智能行业大师访谈1】吴恩达采访 Geoffery Hinton

【趋高机器视觉】机器视觉技术原理解析及解决方案

吴恩达 coursera ML 第七课总结+作业答案前言目录正文模型表示作业答案

XGBoost Plotting API以及GBDT组合特征实践 XGBoost Plotting API以及GBDT组合特征实践

解码器用于语义分割：数据依赖的解码可以实现灵活的特征聚合

2021-2025年中国运动疗法（KT）带行业市场供需与战略研究报告