天天看點

懿說學區(35)SPSS統計分析(45)二階聚類分析

作者:LearningYard學苑

Yishuo School District (35) | SPSS Statistical Analysis (45) Second-order Cluster Analysis

懿說學區(35)SPSS統計分析(45)二階聚類分析

“分享興趣,傳播快樂,增長見聞,留下美好! 大家好,這裡是小編。歡迎大家繼續通路學苑内容,我們将竭誠為您帶來更多更好的内容分享。

"Share interest, spread happiness, increase knowledge, and leave a good impression! Hello everyone, this is Xiaobian. Welcome to continue to visit the content of Xueyuan, and we will wholeheartedly bring you more and better content to share.

懿說學區(35)SPSS統計分析(45)二階聚類分析

上一期,我們講到了聚類和判别分析的基礎概念,這一期,我們具體來講聚類分析中的二階聚類。首先,二階聚類時一個探索性的分析工具,為揭示自然的分類或者是分組而設計,是針對資料集内部而不是外觀的分類,是一種新型的分層聚類算法。

二階聚類過程除了使用傳統的歐氏距離之外,為了處理分類變量和連續變量,它使用似然距離測度,并要求模型中的變量是獨立的。分類變量呈多項式分布,連續變量呈正态分布。

In the last issue, we talked about the basic concepts of clustering and discriminant analysis. In this issue, we will specifically talk about the second-order clustering in cluster analysis. First of all, second-order clustering is an exploratory analysis tool designed to reveal natural classification or grouping. It is aimed at the classification within the dataset rather than the appearance. It is a new hierarchical clustering algorithm.

In addition to using the traditional Euclidean distance, the second order clustering process uses the likelihood distance measure to deal with the classified variables and continuous variables, and requires that the variables in the model are independent. The classification variable is polynomial distribution, and the continuous variable is normal distribution.

使用兩個變量的相關過程取檢驗兩個連續變量之間的獨立性,使用交叉表過程檢驗兩個分類變量之間的獨立性,使用均值比較過程檢驗連續變量和分類變量的獨立性,用探索分析過程檢驗連續變量的正态性,使用卡方過程檢驗分類變量是否呈多項式分布。

懿說學區(35)SPSS統計分析(45)二階聚類分析

Use the correlation process of two variables to test the independence between two continuous variables, use the cross table process to test the independence between two classification variables, use the mean comparison process to test the independence of continuous variables and classification variables, use the exploratory analysis process to test the normality of continuous variables, and use the chi-square process to test whether the classification variables are polynomial distribution.

二階聚類分為兩個步驟完成,第一步是建構聚類特征樹,對每個觀測變量都考察一遍,确定類中心。根據相近者為同一類的原則,計算距離并把距類中心距離最小的觀測量分到相應類中,這個過程稱為建構了一個分類的特征樹(CF)。開始,它把一個觀測量放在樹的葉節點根部,該節點含有該觀測量的變量資訊。然後,使用距離測度作為相似性測度的判斷依據,每個後續的觀測量根據它已存在的節點的相似性歸到某類去。如果相似則該觀測量加在一個已存在的節點上,稱為該節點的葉子。如果不相似,就形成一個新的節點。

懿說學區(35)SPSS統計分析(45)二階聚類分析

Second-order clustering is completed in two steps. The first step is to build a clustering feature tree, and inspect each observation variable once to determine the cluster center. According to the principle that the similar ones belong to the same class, calculate the distance and divide the observation with the smallest distance from the center of the class into the corresponding classes. This process is called building a classified feature tree (CF). At first, it places an observation at the root of the leaf node of the tree, which contains the variable information of the observation. Then, the distance measure is used as the judgment basis for the similarity measure, and each subsequent observation is classified into a certain category according to the similarity of its existing nodes. If similar, the observation is added to an existing node, which is called the leaf of the node. If not, a new node will be formed.

第二步是對聚類特征樹的節點進行分組,為了确定最好的類數,對每一個聚類結果使用Akaik判據(AIC)或貝葉斯判據(BIC)作為标準進行比較,得出最後的聚類結果。

The second step is to group the nodes of the clustering feature tree. In order to determine the best number of clusters, Akaik criterion (AIC) or Bayesian criterion (BIC) are used for comparison of each clustering result to obtain the final clustering result.

接下來,我們來看一個執行個體,某機構為了調查學生性别和所學專業與畢業後初始公司的情況。調查抽取了50名學生的資料,如下圖所示(“學科”1:農業,2:建築,3:地質,4:商務,5:林學,6:教育,7:工程,8:藝術)。試根據樣本名額進行聚類分析。

懿說學區(35)SPSS統計分析(45)二階聚類分析

Next, let's take a look at an example of an institution to investigate the gender of students, their majors and the initial company after graduation. The survey sampled data from 50 students, as shown in the figure below ("discipline" 1: agriculture, 2: architecture, 3: geology, 4: business, 5: forestry, 6: education, 7: engineering, 8: art). Try to conduct cluster analysis according to sample indicators.

第一步,分析并組織資料,由于自變量中不僅有連續屬性,也有分類變量,故采用二階聚類分析。按上圖所示定義變量,輸入資料并儲存。

懿說學區(35)SPSS統計分析(45)二階聚類分析
懿說學區(35)SPSS統計分析(45)二階聚類分析

The first step is to analyze and organize the data. Because there are not only continuous attributes but also classified variables in the independent variables, the second-order cluster analysis is adopted. Define variables as shown in the figure above, input data and save.

第二步,進行二階聚類分析的設定。按下圖所示進行設定。

懿說學區(35)SPSS統計分析(45)二階聚類分析
懿說學區(35)SPSS統計分析(45)二階聚類分析
懿說學區(35)SPSS統計分析(45)二階聚類分析
懿說學區(35)SPSS統計分析(45)二階聚類分析

The second step is to set the second-order cluster analysis. Set as shown in the following figure.

第三步,主要結果和分析,結果将所有的個案聚成3類,聚類的平均輪廓值為0.6,說明聚類品質比較好。

懿說學區(35)SPSS統計分析(45)二階聚類分析
懿說學區(35)SPSS統計分析(45)二階聚類分析

The third step is the main results and analysis. The results gather all cases into three categories. The average contour value of clustering is 0.6, indicating that the quality of clustering is relatively good.

下期預告:本期,我們學習了

二階聚類的理論知識和基礎運用。

下一期,我們将會學習

K-均值聚類的理論和執行個體操作。

Preview of next issue: In this issue, we learned the theoretical knowledge and basic application of second-order clustering. In the next issue, we will learn the theory and example operation of K-means clustering.

今天的分享就到這裡了

如果您對今天的文章有獨特的想法

歡迎給我們留言

讓我們相約明天

祝您今天過得開心快樂!

That's all for today's sharing. If you have unique ideas about today's article, please leave us a message. Let's meet tomorrow. I wish you a happy day today!

懿說學區(35)SPSS統計分析(45)二階聚類分析

參考資料:百度百科,《SPSS 23 統計分析實用教程》

翻譯:百度翻譯

本文由learningyard新學苑原創,部分文字圖檔來源于他處,如有侵權,請聯系删除。

繼續閱讀