ML之K-means：基于DIY数据集利用K-means算法聚类(测试9种不同聚类中心的模型性能)

2021-10-29 23:50:00

输出结果

ML之K-means：基于DIY数据集利用K-means算法聚类(测试9种不同聚类中心的模型性能)

设计思路

1、使用均匀分布函数随机三个簇，每个簇周围10个数据样本。
2、绘制30个数据样本的分布图像。
3、测试9种不同聚类中心数量下，每种情况的聚类质量，并作图。

实现代码

import numpy as np

import matplotlib.pyplot as plt

from sklearn.cluster import KMeans

from scipy.spatial.distance import cdist

#1、使用均匀分布函数随机三个簇，每个簇周围10个数据样本。

cluster1 = np.random.uniform(0.5, 1.5, (2, 10))

cluster2 = np.random.uniform(5.5, 6.5, (2, 10))

cluster3 = np.random.uniform(3.0, 4.0, (2, 10))

#2、绘制30个数据样本的分布图像。

X = np.hstack((cluster1, cluster2, cluster3)).T

plt.scatter(X[:,0], X[:, 1])

plt.xlabel('x1')

plt.ylabel('x2')

plt.title('DIY data:30, Random 3 clusters(10 data samples around each cluster)')

plt.show()

#3、测试9种不同聚类中心数量下，每种情况的聚类质量，并作图。

K = range(1, 10)

meandistortions = []

for k in K:

kmeans = KMeans(n_clusters=k)

kmeans.fit(X)

meandistortions.append(sum(np.min(cdist(X, kmeans.cluster_centers_, 'euclidean'), axis=1))/X.shape[0])

plt.plot(K, meandistortions, 'bx-')

plt.xlabel('k')

plt.ylabel('Average Dispersion')

plt.title('K-means: Selecting k with the Elbow Method')

ML之K-means：基于DIY数据集利用K-means算法聚类(测试9种不同聚类中心的模型性能)

输出结果

设计思路

实现代码

继续阅读

libsvm for python 安装

学习软件测试基础测试第七天

Zeppelin 配置访问 REST APIApache Zeppelin Configuration REST API

【Torch】最简洁logging使用指南

笔试面试题目：滑动窗口(二)

27. Remove Element(列表)题目代码

数据结构与算法（27）——排序（二）

Dijkstra--简易版（最短路径）

GitHub连夜封杀！这份阿里 10W 字内部 Java 字面试手册到底有多强？

Cloud Studio初体验

使用 ctypes 进行 Python 和 C 的混合编程

【python】【数据处理】画多维数据分布图

【python】netconf协议对接管理设备

「Python 网络自动化」NETCONF —— Python 使用 NETCONF 管理配置 H3C 网络设备

在python中创建excel并写入

hdu7108哈希