Multivariate Analysis ch1 (Overview)Outlines:1. Data2. Type of Methods3. Notations and definitions4. Estimator Statistics;5. About multivariate computing: outliers, missing values, standardization;

2023-03-16 02:41:17

This is the note of book “Applied multivariate methods for data analysts”1.

Outlines:

Data: Response Variable v.s. Experimental Units;
Type of Methods: Variable-directed techinique v.s. Individual-directed techniques;
Notations and definitions;
Estimator Statistics;
About multivariate computing: outliers, missing values, standardization;

1. Data

Multivariate data is common but complex, thus is very important and the major goal is simplify it.

Two aspects of data:

Response variables;
The experimental units;

Methods focus on the relationship among the response variables, the relationship among the experimental units and the relationship between the response variables and the experimental units;

2. Type of Methods

2.1 Variable-directed techinique

PCA (Principle component analysis)
FA (Factor analysis)
regression (Logistic regression)
CCA (Canonical correlation analysis)

These methods mainly operates on the correlation matrix and focus on the column of data matrix: the response variable.

2.2 Individual-directed techniques

DA (Discriminant analysis)
CA (Cluster analysis)
MANOVA (Multivariate analysis of variance)

Remarks:

These methods focus on the row of data matrix: the observations or experimental units;
Many MA methods require the Independence of experimental units;

3. Notations and definitions

Notation	Explanation
p	variables
N	sample size
X=(xrj)N×p,r=1,...,N;j=1,...,p	data matrix
xr=(xr1,...,xrp)′	the r ’th observation
r,s,t	subscript for experimental units
i,j,k	subscript for response variables

3.1 Multivariate normal distribution

(Def): x=(x1,...,xp)′ follows a multivariate normal distribution if ∀a ,

a′x=∑i=1paixi

follows a univariate normal distribution.

Denote it by: x∼Np(μ,Σ) , the p.d.f is :

fx(x,μ,Σ)=1(2π)p−−−−−√|Σ|1/2exp{−12[(x−μ)′Σ−1(x−μ)]};

3.2 Mathmatical numbers:

Mean vector: μ=E(x)=(E(x1),...,E(xp))′=(μ1,...,μp)′;
Variance-covariance matrix: Σ=cov(X=E[(x−μ)(x−μ)′])=(σij)p×p=⎛⎝⎜⎜σ11⋮σp1σ12⋮σp2…⋱…σ1p⋮σpp⎞⎠⎟⎟
- σii=E[(xi−μi)2] ;
- σij=cov(xi,xj) ;
Correlation matrix;

P=⎛⎝⎜⎜⎜⎜⎜⎜1ρ21⋮ρp1ρ121⋮ρp2……⋱…ρ1pρ2p⋮1⎞⎠⎟⎟⎟⎟⎟⎟

4. Estimator Statistics;

4.1. Unbiased estimators:

μ^=1N(∑r=1Nxr)=rowMeans(X);

Σ^=1N−1[∑r=1N(xr−μ^)(xr−μ^)′]=(σ^ij)p×p;

σ^ij=1N−1[∑r=1N(xri−μ^i)(xrj−μ^j)′)];

4.2. Biased but commonly use estimators:

rij=ρ^ij=σ^ijσ^iiσ^jj−−−−−√;

R=P^=⎛⎝⎜⎜⎜⎜⎜1r21⋮rp1r121⋮rp2……⋱…r1pr2p⋮1⎞⎠⎟⎟⎟⎟⎟;

5. About multivariate computing: outliers, missing values, standardization;

5.1 outliers

Detect it by plot or PCA;
Dealing with it:: analyze the impact of outliers on the results (with outliers v.s. without outliers);

5.2 missing values

use row means (or  KNN) to replace it;
remove the corresponding row;

5.3 standardization

zrj=xrj−μ^jσ^jj−−−√;r=1,..,N;j=1,..,p.

Standardization is the default operation in the computer programs.

References

Johnson D E, 約翰遜. Applied multivariate methods for data analysts[M]. Pacific Grove, CA: Duxbury Press, 1998. ↩

Multivariate Analysis ch1 (Overview)Outlines:1. Data2. Type of Methods3. Notations and definitions4. Estimator Statistics;5. About multivariate computing: outliers, missing values, standardization;

Outlines:

1. Data

2. Type of Methods

2.1 Variable-directed techinique

2.2 Individual-directed techniques

3. Notations and definitions

3.1 Multivariate normal distribution

3.2 Mathmatical numbers:

4. Estimator Statistics;

4.1. Unbiased estimators:

4.2. Biased but commonly use estimators:

5. About multivariate computing: outliers, missing values, standardization;

5.1 outliers

5.2 missing values

5.3 standardization

References

繼續閱讀

概論_第8章_假設檢驗的基本步驟__假設檢驗的類型

概論_第8章_假設檢驗_經常出現的符号

大數定律、中心極限定理總結

【數理統計】卡方檢驗

【數理統計】t檢驗參數檢驗和非參數檢驗t檢驗t檢驗應用場景應用案例

The Difference between Probability and Statistic

【随機過程】随即過系列之先修基礎

Survey samplingSurvey sampling

拓端tecdat|【視訊】R語言實作CNN（卷積神經網絡）模型進行回歸資料分析原文連結：http://tecdat.cn/?p=18149原文出處：拓端資料部落公衆号準備定義和拟合模型預測和可視化結果

機率論基礎複習

共轭先驗分布簡記背景知識Beta分布狄利克雷分布

正态分布及抽樣分布抽樣分布

beta分布和Dirichlet分布

數理統計10.15 | 幂律分布定義示例性質

方差，标準差，協方差，樣本标準差，總體标準差，抽樣平均誤差總體參數樣本參數（用樣本參數替代總體參數）抽樣平均誤差

解密随機性：探索機率與數理統計的奧秘