文章目录

一、说明
二、数据项说明
三、实战部分

一、说明

我是在jupyter完成的，然后导出成markdown格式，ipynb文件导出为markdown的命令如下：

jupyter nbconvert --to markdown  xxx.ipynb

源代码和数据文件，点击这里获取

二、数据项说明

Name		Data Type	Meas.	Description
	----		---------	-----	-----------
	Sex		nominal			M, F, and I (infant)
	Length		continuous	mm	Longest shell measurement
	Diameter	continuous	mm	perpendicular to length
	Height		continuous	mm	with meat in shell
	Whole weight	continuous	grams	whole abalone
	Shucked weight	continuous	grams	weight of meat
	Viscera weight	continuous	grams	gut weight (after bleeding)
	Shell weight	continuous	grams	after being dried
	Rings		integer			+1.5 gives the age in years

现在有8个数据字段，前面7个是特征值，最最后一个Rings为预测，具体请查阅文件内容

三、实战部分

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

Sex	Length	Diameter	Height	Whole weight	Shucked weight	Viscera weight	Shell weight	Rings
M	0.455	0.365	0.095	0.5140	0.2245	0.1010	0.150	15
1	M	0.350	0.265	0.090	0.2255	0.0995	0.0485	0.070	7
2	F	0.530	0.420	0.135	0.6770	0.2565	0.1415	0.210	9
3	M	0.440	0.365	0.125	0.5160	0.2155	0.1140	0.155	10
4	I	0.330	0.255	0.080	0.2050	0.0895	0.0395	0.055	7
5	I	0.425	0.300	0.095	0.3515	0.1410	0.0775	0.120	8
6	F	0.530	0.415	0.150	0.7775	0.2370	0.1415	0.330	20
7	F	0.545	0.425	0.125	0.7680	0.2940	0.1495	0.260	16
8	M	0.475	0.370	0.125	0.5095	0.2165	0.1125	0.165	9
9	F	0.550	0.440	0.150	0.8945	0.3145	0.1510	0.320	19

# 查看数据容量 
dataframe01.shape

(4177, 9)

Index(['Sex', 'Length', 'Diameter', 'Height', 'Whole weight', 'Shucked weight',
       'Viscera weight', 'Shell weight', 'Rings'],
      dtype='object')

# 清洗数据
# 替换特征值，将性别中的字符类型转化为整数
dataframe02 = dataframe01.copy()

dataframe02.Sex[dataframe01['Sex']=='I']=0
dataframe02.Sex[dataframe01['Sex']=='F']=1
dataframe02.Sex[dataframe01['Sex']=='M']=2

Sex	Length	Diameter	Height	Whole weight	Shucked weight	Viscera weight	Shell weight	Rings
2	0.455	0.365	0.095	0.5140	0.2245	0.1010	0.150	15
1	2	0.350	0.265	0.090	0.2255	0.0995	0.0485	0.070	7
2	1	0.530	0.420	0.135	0.6770	0.2565	0.1415	0.210	9
3	2	0.440	0.365	0.125	0.5160	0.2155	0.1140	0.155	10
4	0.330	0.255	0.080	0.2050	0.0895	0.0395	0.055	7
5	0.425	0.300	0.095	0.3515	0.1410	0.0775	0.120	8
6	1	0.530	0.415	0.150	0.7775	0.2370	0.1415	0.330	20
7	1	0.545	0.425	0.125	0.7680	0.2940	0.1495	0.260	16
8	2	0.475	0.370	0.125	0.5095	0.2165	0.1125	0.165	9
9	1	0.550	0.440	0.150	0.8945	0.3145	0.1510	0.320	19

# 导入线性回归的库
from sklearn.linear_model import LinearRegression as LR
from sklearn.model_selection import train_test_split
from sklearn.model_selection import cross_val_score

data_index

['Sex',
 'Length',
 'Diameter',
 'Height',
 'Whole weight',
 'Shucked weight',
 'Viscera weight',
 'Shell weight',
 'Rings']

# 获取特征矩阵X 的index
X_index = data_index[0:-1]
Y_index = data_index[-1]

X_index, Y_index

(['Sex',
  'Length',
  'Diameter',
  'Height',
  'Whole weight',
  'Shucked weight',
  'Viscera weight',
  'Shell weight'],
 'Rings')

Sex	Length	Diameter	Height	Whole weight	Shucked weight	Viscera weight	Shell weight
2	0.455	0.365	0.095	0.5140	0.2245	0.1010	0.150
1	2	0.350	0.265	0.090	0.2255	0.0995	0.0485	0.070
2	1	0.530	0.420	0.135	0.6770	0.2565	0.1415	0.210
3	2	0.440	0.365	0.125	0.5160	0.2155	0.1140	0.155
4	0.330	0.255	0.080	0.2050	0.0895	0.0395	0.055

0    15
1     7
2     9
3    10
4     7
Name: Rings, dtype: int64

# 划分训练集和测试集
Xtrain, Xtest, Ytrain, Ytest = train_test_split(X,Y,test_size=0.2,random_state=420)

Sex	Length	Diameter	Height	Whole weight	Shucked weight	Viscera weight	Shell weight
2763	0.550	0.425	0.135	0.6560	0.2570	0.1700	0.203
439	2	0.500	0.415	0.165	0.6885	0.2490	0.1380	0.250
1735	2	0.670	0.520	0.165	1.3900	0.7110	0.2865	0.300
751	2	0.485	0.355	0.120	0.5470	0.2150	0.1615	0.140
1626	1	0.570	0.450	0.135	0.7805	0.3345	0.1850	0.210

2763    10
439     13
1735    11
751     10
1626     8
Name: Rings, dtype: int64

#恢复索引
for i in [Xtrain, Xtest]:
    i.index = range(i.shape[0])

#恢复索引
for i in [Ytrain, Ytest]:
    i.index = range(i.shape[0])

Sex	Length	Diameter	Height	Whole weight	Shucked weight	Viscera weight	Shell weight
0.550	0.425	0.135	0.6560	0.2570	0.1700	0.203
1	2	0.500	0.415	0.165	0.6885	0.2490	0.1380	0.250
2	2	0.670	0.520	0.165	1.3900	0.7110	0.2865	0.300
3	2	0.485	0.355	0.120	0.5470	0.2150	0.1615	0.140
4	1	0.570	0.450	0.135	0.7805	0.3345	0.1850	0.210

0    10
1    13
2    11
3    10
4     8
Name: Rings, dtype: int64

# 先用训练集训练(fit)标准化的类，然后用训练好的类分别转化(transform)训练集和测试集

# 开始建模
reg = LR().fit(Xtrain, Ytrain)

4.22923686878166

22.656846035572762

array([  0.40527178,  -0.88791132,  13.01662939,  10.39250886,
         9.64127293, -20.87747601, -10.50683081,   7.70632772])

Xtrain.columns

Index(['Sex', 'Length', 'Diameter', 'Height', 'Whole weight', 'Shucked weight',
       'Viscera weight', 'Shell weight'],
      dtype='object')

[('Sex', 0.4052717783379893),
 ('Length', -0.8879113179582045),
 ('Diameter', 13.016629389061475),
 ('Height', 10.39250886428478),
 ('Whole weight', 9.64127293101552),
 ('Shucked weight', -20.87747600529615),
 ('Viscera weight', -10.506830809919672),
 ('Shell weight', 7.706327719866024)]

Name Data Type Meas. Description

Sex nominal M, F, and I (infant)

Length continuous mm Longest shell measurement

Diameter continuous mm perpendicular to length

Height continuous mm with meat in shell

Whole weight continuous grams whole abalone

Shucked weight continuous grams weight of meat

Viscera weight continuous grams gut weight (after bleeding)

Shell weight continuous grams after being dried

Rings integer +1.5 gives the age in years

# 截距
reg.intercept_

2.7888240054011835

# 自定义最小二乘法尝试
def my_least_squares(x_array, y_array):
    '''
    :param x: 列表，表示m*n矩阵
    :param y: 列表，表示m*1矩阵
    :return: coef：list 回归系数(1*n矩阵)   intercept: float 截距
    '''
    # 矩阵对象化
    arr_x_01 = np.array(x_array)
    arr_y_01 = np.array(y_array)

    # x_array由 m*n矩阵转化为 m*(n+1)矩阵，其中第n+1列系数全为1
    # 获取行数
    row_num = arr_x_01.shape[0]

    # 生成常量系数矩阵  m*1矩阵
    arr_b = np.array([[1 for i in range(0, row_num)]])

    # 合并成m*(n+1)矩阵
    arr_x_02 = np.insert(arr_x_01, 0, values=arr_b, axis=1)

    # 矩阵运算
    w = np.linalg.inv(np.matmul(arr_x_02.T, arr_x_02))
    w = np.matmul(w, arr_x_02.T)
    w = np.matmul(w, arr_y_01)
    
    # w为1*(n+1)矩阵
    # print(w)
    result = list(w)
    coef = result.pop(-1)
    intercept = result
    
    return coef, intercept

# debug中
my_least_squares(Xtrain,list(Ytrain))

# 梯度下降法尝试
def costFunc(X,Y,theta):
    '''
    代价函数
    '''
    inner = np.power((X*theta.T)-Y,2)
    return np.sum(inner)/(2*len(X))

def gradientDescent(X,Y,theta,alpha,iters):
    '''
    梯度下降
    '''
    temp = np.mat(np.zeros(theta.shape))
    cost = np.zeros(iters)
    thetaNums = int(theta.shape[1])
    print(thetaNums)
    for i in range(iters):
        error = (X*theta.T-Y)
        for j in range(thetaNums):
            derivativeInner = np.multiply(error,X[:,j])
            temp[0,j] = theta[0,j] - (alpha*np.sum(derivativeInner)/len(X))

        theta = temp
        cost[i] = costFunc(X,Y,theta)

    return theta,cost

基于最小二乘法的一般多元线性回归的实战一、说明二、数据项说明三、实战部分

文章目录

一、说明

二、数据项说明

三、实战部分

继续阅读

简单文档分类——朴素贝叶斯算法朴素贝叶斯算法简单文档分类实例步骤总结朴素贝叶斯分类调用(sklearn)

【分类算法】什么是分类算法定义分类与聚类分类过程方法

分类算法的评价指标

K-近邻算法以及图像分类应用

weka之NB算法

使用weka的select attribute

weka中分类器算法

在weka中集成自己的算法

【多变量线性回归】学习记录序思路实现终

申请评分模型拒绝推断（RI）方法申请评分模型拒绝推断（RI）方法

【人工智能行业大师访谈1】吴恩达采访 Geoffery Hinton

【趋高机器视觉】机器视觉技术原理解析及解决方案

吴恩达 coursera ML 第七课总结+作业答案前言目录正文模型表示作业答案

XGBoost Plotting API以及GBDT组合特征实践 XGBoost Plotting API以及GBDT组合特征实践

解码器用于语义分割：数据依赖的解码可以实现灵活的特征聚合

2021-2025年中国运动疗法（KT）带行业市场供需与战略研究报告