上文通过使用决策树算法简单实现了僵尸用户的识别:https://blog.csdn.net/weixin_43906500/article/details/116992642
本文综合利用多种机器学习方法实现对僵尸用户的识别
使用的机器学习方法有:决策树算法、随机森林算法、极端随机数算法
from sklearn.model_selection import cross_val_score
from sklearn.datasets import make_blobs
from sklearn.ensemble import RandomForestClassifier
from sklearn.ensemble import ExtraTreesClassifier
from sklearn.tree import DecisionTreeClassifier
import pandas as pd
data = pd.read_csv("users_data/corpse_user_data/csv_test.csv",encoding="utf-8")
columns = [column for column in data][1:-1]
X = data.loc[0:,columns]
y = data.loc[0:,"mark"]
X = X.values
y = y.values
clf = DecisionTreeClassifier(max_depth=None, min_samples_split=2,random_state = 0)
# 定义一个决策树分类器
scores = cross_val_score(clf, X, y)
print("DecisionTreeClassifier:"+scores.mean())
# 这里是决策树的模型精准度得分
clf = RandomForestClassifier(n_estimators=10, max_depth=None,min_samples_split = 2, random_state = 0)
# 定义一个随机森林分类器
scores = cross_val_score(clf, X, y)
print("RandomForestClassifier:"+scores.mean())
# 这里是随机森林训练器的模型精确度得分
clf = ExtraTreesClassifier(n_estimators=10, max_depth=None,min_samples_split = 2, random_state = 0)
# 定义一个极端森林分类器
scores = cross_val_score(clf, X, y)
print("ExtraTreesClassifier:"+scores.mean())
# 这里是极端森林训练器的模型精确度得分,效果优于随机森林
输出结果如下:
DecisionTreeClassifier:0.9608660785886126
RandomForestClassifier:0.9723870622828121
ExtraTreesClassifier:0.9769847634322373