前言

開始接觸機器學習，一個必不可少的一個工具就是xgboost，這裡使用xgboost中最簡單的功能完成一個kaggle競賽：Boston Housing，而完成的代碼行數隻有不到40行，足以看出xgboost的強大！

賽題

根據給出的資料屬性進行對應房價的預測。

賽題的位址：https://www.kaggle.com/c/boston-housing#description

資料

作為入門，這裡沒有考慮更多一步的優化，隻對資料進行簡單的認識一下（打開train.csv）：

初識xgboost: kaggle Boston Housing 實戰前言賽題

訓練集包括了15列，第一列是ID，最後一列是medv（要預測的資料），是以在訓練的時候将這兩個屬性去除。
訓練集中70%的資料取出用于訓練，30%的資料取出用于評價。

打開測試集（test.csv）

初識xgboost: kaggle Boston Housing 實戰前言賽題

缺少了預測項medv.

代碼實作

代碼非常簡單，但還是需要對pandas,numpy庫有些了解，對應的方法可以直接查詢文檔。

#!/usr/bin/env python
# -*- coding: utf-8 -*-
# @Time    : 2018/9/10 22:51
# @Author  : likewind
# @mail    : [email protected]
# @File    : BostonHousing.py
import numpy as np
import pandas as pd
import xgboost as xgb
from sklearn.model_selection import train_test_split
from sklearn.metrics import mean_squared_error
#load dataset
dataset_folder = r'E:/kaggle_race/BostonHousing/'
dataset_train = dataset_folder + 'train.csv'
dataset_test = dataset_folder + 'test.csv'

data_train = pd.read_csv(dataset_train)
data_test = pd.read_csv(dataset_test)

#drop irrelevant properties
X = data_train.drop(['ID', 'medv'], axis=1)
#medv is train label
y = data_train.medv

#split train_dataset into train_data, train_label, test_data, test_label
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=123)
#use regressor to predict medv.
xg_reg = xgb.XGBRegressor(objective='reg:linear', colsample_bytree=0.3, learning_rate=0.1, max_depth=10,
                          alpha = 10, n_estimators=500, reg_lambda=2)
#training the regressor
xg_reg.fit(X_train, y_train)
#use the trained regressor to predict test_data
preds = xg_reg.predict(X_test)
#calc rmse
rmse = np.sqrt(mean_squared_error(y_test, preds))

print("Trained RMSE: %f " % rmse)
#use the trained regressor to predict test dataset.
x_test = data_test.drop(['ID'], axis=1)

predictions = xg_reg.predict(x_test)

ID = (data_test.ID).astype(int)

result = np.c_[ID, predictions]
#output results
np.savetxt(dataset_folder + 'xgb_submission.csv', result, fmt="%d,%.4f" ,header='ID,medv', delimiter=',', comments='')

效果

送出到kaggle後，得分3.84706

初識xgboost: kaggle Boston Housing 實戰前言賽題

大概在19名左右。

初識xgboost: kaggle Boston Housing 實戰前言賽題

在完成競賽的過程中，我們并沒有考慮每個特征對結果的影響，是以在接下來的過程中，優化的空間會很大！

初識xgboost: kaggle Boston Housing 實戰前言賽題

前言

賽題

資料

代碼實作

效果

繼續閱讀

XGBoost Plotting API以及GBDT組合特征實踐 XGBoost Plotting API以及GBDT組合特征實踐

解碼器用于語義分割：資料依賴的解碼可以實作靈活的特征聚合

YAML簡介和PyYAML安全操作YAML支援的類型YAML的優點：yaml的基本文法python操作

2021-2025年中國運動療法（KT）帶行業市場供需與戰略研究報告

Small tricks

libsvm for python 安裝

學習軟體測試基礎測試第七天

Zeppelin 配置通路 REST APIApache Zeppelin Configuration REST API

【Torch】最簡潔logging使用指南

27. Remove Element(清單)題目代碼

Cloud Studio初體驗

使用 ctypes 進行 Python 和 C 的混合程式設計

【python】【資料處理】畫多元資料分布圖

【python】netconf協定對接管理裝置

「Python 網絡自動化」NETCONF —— Python 使用 NETCONF 管理配置 H3C 網絡裝置

在python中建立excel并寫入