機器學習算法之決策樹回歸模型可視化

一個程式員在海濱遊泳時溺水身亡。他死前拼命的呼救，當時海灘上有許多救生員，但是沒有人救他。因為他一直大喊“F1!”“F1!”，誰都不知道“F1”究竟是什麼意思。

機器學習算法之決策樹回歸模型可視化

分類樹和回歸樹的差別

機器學習算法之決策樹回歸模型可視化

import numpy as np
import matplotlib as mpl
import matplotlib.pyplot as plt
import pandas as pd
import warnings
import sklearn
from sklearn.linear_model import LinearRegression, LassoCV, RidgeCV
from sklearn.pipeline import Pipeline
from sklearn.model_selection  import train_test_split
from sklearn.model_selection import GridSearchCV
from sklearn.tree import DecisionTreeRegressor
from sklearn.pipeline import Pipeline
from sklearn.feature_selection import SelectKBest
from sklearn.feature_selection import chi2
from sklearn.preprocessing import MinMaxScaler
from sklearn.decomposition import PCA
from sklearn.pipeline import Pipeline
from sklearn.model_selection  import GridSearchCV
from sklearn.linear_model.coordinate_descent import ConvergenceWarning

def notEmpty(s):
    return s != ''

mpl.rcParams['font.sans-serif']=[u'simHei']
mpl.rcParams['axes.unicode_minus']=False
## 攔截異常
warnings.filterwarnings(action = 'ignore', category=ConvergenceWarning)

names = ['CRIM','ZN', 'INDUS','CHAS','NOX','RM','AGE','DIS','RAD','TAX','PTRATIO','B','LSTAT']
path = "datas/boston_housing.data"
## 由于資料檔案格式不統一，是以讀取的時候，先按照一行一個字段屬性讀取資料，然後再按照每行資料進行處理
fd = pd.read_csv(path, header=None)
data = np.empty((len(fd), 14))
for i, d in enumerate(fd.values):
    d = map(float, filter(notEmpty, d[0].split(' ')))
    data[i] = list(d)

x, y = np.split(data, (13,), axis=1)
y = y.ravel()

print ("樣本資料量:%d, 特征個數：%d" % x.shape)
print ("target樣本資料量:%d" % y.shape[0])

樣本資料量:506, 特征個數：13

target樣本資料量:506

#資料的分割，
x_train1, x_test1, y_train1, y_test1 = train_test_split(x, y, train_size=0.8, random_state=14)
x_train, x_test, y_train, y_test = x_train1, x_test1, y_train1, y_test1
print ("訓練資料集樣本數目：%d, 測試資料集樣本數目：%d" % (x_train.shape[0], x_test.shape[0]))

訓練資料集樣本數目：404, 測試資料集樣本數目：102

#标準化
ss = MinMaxScaler()

x_train = ss.fit_transform(x_train, y_train)
x_test = ss.transform(x_test)

print ("原始資料各個特征屬性的調整最小值:",ss.min_)
print ("原始資料各個特征屬性的縮放資料值:",ss.scale_)

原始資料各個特征屬性的調整最小值: [-7.10352762e-05 0.00000000e+00 -1.68621701e-02 0.00000000e+00

-7.92181070e-01 -6.82314620e-01 -2.98661174e-02 -1.02719857e-01

-4.34782609e-02 -3.56870229e-01 -1.34042553e+00 -6.38977636e-03

-4.90780142e-02]

原始資料各個特征屬性的縮放資料值: [1.12397589e-02 1.00000000e-02 3.66568915e-02 1.00000000e+00

2.05761317e+00 1.91607588e-01 1.02986612e-02 9.09347180e-02

4.34782609e-02 1.90839695e-03 1.06382979e-01 2.53562554e-03

2.83687943e-02]

#構模組化型（回歸）
model = DecisionTreeRegressor(criterion='mae',max_depth=5)
#模型訓練
model.fit(x_train, y_train)
#模型預測
y_test_hat = model.predict(x_test)

#評估模型
score = model.score(x_test, y_test)
print ("Score：", score)

Score： 0.7900379894156064

# 方式三：直接生成圖檔
from sklearn import tree
from IPython.display import Image  
import pydotplus
dot_data = tree.export_graphviz(model, out_file=None,  
                         filled=True, rounded=True,  
                         special_characters=True)  
graph = pydotplus.graph_from_dot_data(dot_data)  
Image(graph.create_png())

機器學習算法之決策樹回歸模型可視化

機器學習算法之決策樹回歸模型可視化

繼續閱讀

YAML簡介和PyYAML安全操作YAML支援的類型YAML的優點：yaml的基本文法python操作

2021-2025年中國運動療法（KT）帶行業市場供需與戰略研究報告

Small tricks

libsvm for python 安裝

2021年危險化學品經營機關安全管理人員考試題庫及危險化學品經營機關安全管理人員考試技巧

學習軟體測試基礎測試第七天

Zeppelin 配置通路 REST APIApache Zeppelin Configuration REST API

【Torch】最簡潔logging使用指南

27. Remove Element(清單)題目代碼

無人機--飛控科普

Cloud Studio初體驗

使用 ctypes 進行 Python 和 C 的混合程式設計

【python】【資料處理】畫多元資料分布圖

【python】netconf協定對接管理裝置

「Python 網絡自動化」NETCONF —— Python 使用 NETCONF 管理配置 H3C 網絡裝置

在python中建立excel并寫入