sklearn.model_selection.train_test_split子產品

在機器學習中，我們通常将原始資料按照比例分割為“測試集”和“訓練集”，通常使用

sklearn.model_selection

裡的

train_test_split

子產品用來分割資料。

備注：舊版本中，使用

sklearn.cross_validation

裡的

train_test_split

子產品用來分割資料。新版本中，

cross_validation

已經棄用，現在改為從

sklearn.model_selection

中調用

train_test_split

函數。

詳細用法參考：sklearn.model_selection.train_test_split官方教程

參數說明：

*arrays ：sequence of indexables with same length / shape[0]. 相同長度/行數的可索引序列

Allowed inputs are lists, numpy arrays, scipy-sparse matrices or pandas dataframes.可以是清單、numpy數組、scipy稀疏矩陣或pandas的資料框

test_size : float, int or None, optional (default=None). 測試集的大小

（1）If float,should be between 0.0 and 1.0 and represent the proportion of the dataset to include in the test split. 如果為float，則取值範圍應在0.0到1.0之間，代表要測試資料集拆分的比例。

（2）If int, represents the absolute number of test samples. 如果為int，則表示測試樣本的絕對數量。

（3）If None, the value is set to the complement of the train size. If train_size is also None, it will be set to 0.25.如果為None，則将其設定為train_size的補集。如果train_size也為None，則将其設定為0.25。

train_size ： float, int, or None, (default=None). 訓練集大小

（1）If float, should be between 0.0 and 1.0 and represent the proportion of the dataset to include in the train split. 如果為float，則取值範圍應在0.0到1.0之間，并代表要訓練資料集拆分的比例。

（2）If int, represents the absolute number of train samples. 如果為int，則表示訓練樣本的絕對數量。

（3）If None, the value is automatically set to the complement of the test size. 如果為None，該值将自動設定為test_size的補集。

random_state : int, RandomState instance or None, optional, (default=None). 随機數生成器的狀态

（1）If int, random_state is the seed used by the random number generator; 如果為int，則random_state是随機數生成器使用的種子；

（2）If RandomState instance, random_state is the random number generator; 如果是RandomState執行個體，則random_state是随機數生成器；

（3）If None, the random number generator is the RandomState instance used by np.random. 如果為None，則随機數生成器是np.random使用的RandomState執行個體。

shuffle ：boolean, optional (default=True) Whether or not to shuffle the data before splitting. 洗牌模式

If shuffle=False then stratify must be None.

stratify ： array-like or None (default=None) 類标簽分層方式

（1）若為None時，劃分出來的測試集或訓練集中，其類标簽的比例也是随機的；

If not None, data is split in a stratified fashion, using this as the class labels. 如果不為None劃分出來的測試集或訓練集中，其類标簽的比例同輸入的數組中類标簽的比例相同，可以用于處理不均衡的資料集。

常見用法：

X_train,X_test, y_train, y_test =

sklearn.model_selection.train_test_split

(train_data,train_target,test_size=0.4, random_state=0,stratify=y_train)

import numpy as np
from sklearn.model_selection import train_test_split

X,y = np.arange(30).reshape((10,3)), range(10)
print(X)
>>>
[[ 0  1  2]
 [ 3  4  5]
 [ 6  7  8]
 [ 9 10 11]
 [12 13 14]
 [15 16 17]
 [18 19 20]
 [21 22 23]
 [24 25 26]
 [27 28 29]]
print(y)
>>>
range(0, 10)

X_train,X_test,y_train,y_test = train_test_split(X,y,test_size=0.3,random_state=20,shuffle=True)  #劃分訓練集和測試集

print(X_train)
>>>
[[15 16 17]
 [ 0  1  2]
 [ 6  7  8]
 [18 19 20]
 [27 28 29]
 [12 13 14]
 [ 9 10 11]]
print(X_test)
>>>
[[21 22 23]
 [ 3  4  5]
 [24 25 26]]
print(y_train)
>>>
[5, 0, 2, 6, 9, 4, 3]
print(y_test)
>>>
[7, 1, 8]

sklearn.model_selection.train_test_split子產品

繼續閱讀

來自python的【條件控制/語句循環/break/continue/else/pass】一、條件控制二、語句循環

無法解析的外部符号 wmain，該符号在函數 "void cdecl mainCRTStartupHelper(struct HINSTANCE *,unsigned short con......

TestLink導出用例轉換工具(XML2Excel)

YAML簡介和PyYAML安全操作YAML支援的類型YAML的優點：yaml的基本文法python操作

Small tricks

libsvm for python 安裝

學習軟體測試基礎測試第七天

Zeppelin 配置通路 REST APIApache Zeppelin Configuration REST API

【Torch】最簡潔logging使用指南

27. Remove Element(清單)題目代碼

Cloud Studio初體驗

使用 ctypes 進行 Python 和 C 的混合程式設計

【python】【資料處理】畫多元資料分布圖

【python】netconf協定對接管理裝置

「Python 網絡自動化」NETCONF —— Python 使用 NETCONF 管理配置 H3C 網絡裝置

在python中建立excel并寫入