天天看點

Distance correlation(距離相關系數)

最近在做特征選擇,要考量幾個特征的相關性,想找這個方法的描述,發現很難在網頁上搜到。以下為整合的:

Distance correlation(距離相關系數)
Distance correlation(距離相關系數)

[11] 王黎明, 吳香華, 趙天良,等. 基于距離相關系數和支援向量機回歸的PM_(2.5)濃度滾動統計預報方案[J]. 環境科學學報, 2017,37(4):1268-1276.(我是從這篇論文上找的,維基百科上有更細緻的,可惜我看不下去啊)

下為python程式:

原文:https://gist.github.com/satra/aa3d19a12b74e9ab7941

from scipy.spatial.distance import pdist, squareform
import numpy as np

from numbapro import jit, float32

def distcorr(X, Y):
    """ Compute the distance correlation function
    
    >>> a = [1,2,3,4,5]
    >>> b = np.array([1,2,9,4,4])
    >>> distcorr(a, b)
    0.762676242417
    """
    X = np.atleast_1d(X)
    Y = np.atleast_1d(Y)
    if np.prod(X.shape) == len(X):
        X = X[:, None]
    if np.prod(Y.shape) == len(Y):
        Y = Y[:, None]
    X = np.atleast_2d(X)
    Y = np.atleast_2d(Y)
    n = X.shape[0]
    if Y.shape[0] != X.shape[0]:
        raise ValueError('Number of samples must match')
    a = squareform(pdist(X))
    b = squareform(pdist(Y))
    A = a - a.mean(axis=0)[None, :] - a.mean(axis=1)[:, None] + a.mean()
    B = b - b.mean(axis=0)[None, :] - b.mean(axis=1)[:, None] + b.mean()
    
    dcov2_xy = (A * B).sum()/float(n * n)
    dcov2_xx = (A * A).sum()/float(n * n)
    dcov2_yy = (B * B).sum()/float(n * n)
    dcor = np.sqrt(dcov2_xy)/np.sqrt(np.sqrt(dcov2_xx) * np.sqrt(dcov2_yy))
    return dcor
           

繼續閱讀