【特征選擇】特征選擇名額和方法小彙總

一、簡介

1、對特征選擇的名額提供計算方法和代碼，包括有：相關系數、互資訊、KS、IV、L1正則化、單特征模型評分、特征重要度或系數大小、boruta特征評價、遞歸特征消除排序。

2、提供特征選擇的方法和代碼：前向搜尋法、遺傳算法啟發式搜尋法，最佳特征檢測法，

# 本次項目使用的資料為以下資料，
from sklearn.datasets import load_breast_cancer
data = load_breast_cancer()
data['data'].shape
X,y= data['data'],data.target

# 模型使用的邏輯回歸，評價名額為auc值，使用代碼為封裝的Feature_select類，代碼附後
from tools.feature_select import Feature_select 
fs = Feature_select(X,y,lr)

二、特征選擇名額

1.皮爾森相關系數

Pearson相關系數反應了特征和标簽之間的線性關系，其一個明顯缺陷是，作為特征排序機制，他隻對線性關系敏感。如果關系是非線性的，即便兩個變量具有一一對應的關系，Pearson相關性也可能會接近0。

array([-0.73002851, -0.4151853 , -0.74263553, -0.70898384, -0.35855997,
       -0.59653368, -0.69635971, -0.77661384, -0.33049855,  0.0128376 ,
       -0.56713382,  0.00830333, -0.5561407 , -0.54823594,  0.06701601,
       -0.29299924, -0.25372977, -0.40804233,  0.00652176, -0.07797242,
       -0.77645378, -0.45690282, -0.78291414, -0.73382503, -0.42146486,
       -0.59099824, -0.65961021, -0.79356602, -0.41629431, -0.32387219])

2.互資訊

相比皮爾森相關系數，互資訊能夠一定程度反應特征和标簽之間的非線性關系。

array([0.36671567, 0.09648092, 0.40428514, 0.35976973, 0.08173721,
       0.21061423, 0.37479754, 0.44305847, 0.06671437, 0.01354967,
       0.24521911, 0.00110187, 0.278233  , 0.33920662, 0.01602095,
       0.07457715, 0.11807899, 0.12371226, 0.013367  , 0.04097258,
       0.44985384, 0.12364186, 0.47728208, 0.46426447, 0.10166035,
       0.22460362, 0.31435137, 0.43651194, 0.09544855, 0.06837461])

3.KS和IV值

KS和IV值作為風控中經常使用的兩個名額，IV值更加注重特征對标簽的區分能力，KS值注重模型對于标簽的區分

(array([0.72390466, 0.45248666, 0.74467523, 0.73142276, 0.31537709,
        0.57175889, 0.75971143, 0.81985624, 0.30314201, 0.05657735,
        0.6018313 , 0.08626658, 0.5867951 , 0.70708472, 0.02433804,
        0.37730564, 0.48077533, 0.45248666, 0.00573437, 0.2043893 ,
        0.79730194, 0.44318482, 0.82737435, 0.80482004, 0.40636066,
        0.54920459, 0.70058401, 0.80290418, 0.33869774, 0.29537287]),
 array([3.78395724, 1.5824873 , 3.82087721, 3.82234285, 1.34695804,
        2.10298612, 3.00830869, 3.9745556 , 1.31276152, 1.15309762,
        2.53434592, 1.06847433, 2.80319049, 3.65151974, 1.06348267,
        1.39990837, 1.61294779, 1.59034111, 1.06672724, 1.14612131,
        4.44837968, 1.54256612, 4.58499352, 4.39469498, 1.48978388,
        2.19735313, 2.57167549, 4.37557075, 1.63321251, 1.33836142]))

4.L1正則化

L1正則化将系數w的l1範數作為懲罰項加到損失函數上，由于正則項非零，這就迫使那些弱的特征所對應的系數變成0。是以L1正則化往往會使學到的模型很稀疏（系數w經常為0），這個特性使得L1正則化成為一種很好的特征選擇方法。

這裡采用sklearn總lasso模型進行封裝

array([ 0.        ,  0.00202185,  0.        ,  0.00043358, -0.        ,
       -0.        , -0.        , -0.        , -0.        , -0.        ,
       -0.        , -0.        , -0.        , -0.00125443, -0.        ,
       -0.        , -0.        , -0.        , -0.        , -0.        ,
       -0.08225553, -0.01460497, -0.01379535,  0.0007448 , -0.        ,
       -0.        , -0.06013473, -0.        , -0.        , -0.        ])

5.基于學習模型的特征評分

這種方法的思路是直接使用你要用的機器學習算法，針對每個單獨的特征和響應變量建立預測模型，得出模型的評分。

其實Pearson相關系數等價于線性回歸裡的标準化回歸系數。假如某個特征和響應變量之間的關系是非線性的，可以用基于樹的方法（決策樹、随機森林）、或者擴充的線性模型等。基于樹的方法比較易于使用，因為他們對非線性關系的模組化比較好，并且不需要太多的調試。但要注意過拟合問題，是以樹的深度最好不要太大，再就是運用交叉驗證。

array([0.94065577, 0.77690815, 0.94993828, 0.9414743 , 0.72440823,
       0.86654102, 0.93679947, 0.96337183, 0.70027988, 0.46412392,
       0.86841319, 0.46160936, 0.87624184, 0.92835235, 0.5316104 ,
       0.72631256, 0.7815463 , 0.79461106, 0.46528705, 0.6198184 ,
       0.97066582, 0.78313947, 0.97697965, 0.97057468, 0.75707131,
       0.86194584, 0.92023563, 0.96808801, 0.73623915, 0.68791135])

6.特征重要程度或系數大小（select_from_model）

封裝sklearn中Select_from_model，不過這個方法隻能輸出選擇之後的特征，不能輸出具體的評價分數，分析select_from_model代碼,針對随機森林等模型，則輸入其feature_important_大小。如回歸模型，則輸出其系數的大小。

array([0.66640051, 0.30333585, 0.36057931, 0.01055394, 0.02294638,
       0.11137151, 0.15632322, 0.06555693, 0.03189341, 0.00623842,
       0.02738789, 0.24660799, 0.06654065, 0.12058007, 0.00211526,
       0.02430511, 0.03371127, 0.00860147, 0.00775326, 0.00222996,
       0.7054829 , 0.37988099, 0.24474795, 0.01677032, 0.04206173,
       0.35002252, 0.43570279, 0.12673618, 0.10253126, 0.03317253])

7.boruta特征選擇

boruta的思路是針對某一特征，将其值進行随機配置設定，形成該特征的shadow特征，通過比較原始特征和shadow特征的重要對差異，判斷該特征的有效性。這一方法的優點是，形成了對照組，提出了部分不能觀察到的因素造成對特征評分的誤判。

[ 1.93884227e-02  2.55134195e-02 -2.22229096e-02  9.71757303e-04
  3.42173146e-04  4.67893675e-03  6.91337507e-03  2.69658857e-03
  4.77232016e-04 -2.64284804e-04 -2.54539319e-04 -2.34624522e-03
  3.69165467e-03  3.24437344e-02  3.30642361e-05  1.05415307e-03
  1.47101778e-03  3.76387825e-04  1.92383814e-04  8.18088312e-05
  2.77834702e-02  1.12243313e-01 -1.26686099e-02  2.19071929e-02
  1.22108470e-03  1.62224294e-02  2.05164881e-02  5.65378210e-03
  4.06231930e-03  1.34338174e-03]

8.遞歸特征消除

遞歸特征消除的主要思想是反複的構模組化型（如SVM或者回歸模型）然後選出最好的（或者最差的）的特征（可以根據系數來選），把選出來的特征放到一遍，然後在剩餘的特征上重複這個過程，直到所有特征都周遊了。這個過程中特征被消除的次序就是特征的排序。是以，這是一種尋找最優特征子集的貪心算法。
Sklearn提供了RFE包，可以用于特征消除，還提供了RFECV，可以通過交叉驗證來對的特征進行排序。這裡 封裝sklearn中的RFECV，将得分排名反轉，得到評分。

[13  9  5  2 10 13 13 13 12  3 13 13 13 13  0 13  8  6  4  1 11 13 13  7
 13 13 13 13 13 13]

9.全部名額計算和綜合評價

# 計算以上全部名額，并完成歸一化之後，按照配置設定的權重計算綜合得分(score),并按照得分進行排序
fs.feature_eval(eval_weight=[0.1, 0.1, 0.1, 0.1, 0.1, 0.2, 0.2, 0.2, 0.2])

corr		mic      	ks      	iv			l1			learn		bourta		important	rfecv		score
22	-0.782914	0.474132	0.827374	4.584994	-0.018573	0.976980	0.009898	0.244748	13.0	100.000000
20	-0.776454	0.453242	0.797302	4.448380	-0.000000	0.970666	0.034960	0.705483	11.0	94.078947
0	-0.730029	0.365405	0.723905	3.783957	-0.000000	0.940656	0.024852	0.666401	13.0	88.815789
27	-0.793566	0.436879	0.802904	4.375571	-0.000000	0.968088	0.006664	0.126736	13.0	86.842105
13	-0.548236	0.340502	0.707085	3.651520	-0.000056	0.928352	0.032520	0.120580	13.0	81.907895
26	-0.659610	0.317450	0.700584	2.571675	-0.000000	0.920236	0.023289	0.435703	13.0	81.578947
7	-0.776614	0.439923	0.819856	3.974556	-0.000000	0.963372	0.003340	0.065557	13.0	80.921053
6	-0.696360	0.373305	0.759711	3.008309	-0.000000	0.936799	0.008183	0.156323	13.0	78.947368
2	-0.742636	0.401514	0.744675	3.820877	-0.000000	0.949938	-0.021761	0.360579	5.0		78.618421
23	-0.733825	0.464017	0.804820	4.394695	0.000316	0.970575	0.021532	0.016770	7.0		77.302632
21	-0.456903	0.117602	0.443185	1.542566	-0.009721	0.783139	0.118309	0.379881	13.0	75.657895
25	-0.590998	0.226403	0.549205	2.197353	-0.000000	0.861946	0.018325	0.350023	13.0	71.381579
5	-0.596534	0.212811	0.571759	2.102986	-0.000000	0.866541	0.005434	0.111372	13.0	64.802632
12	-0.556141	0.276573	0.586795	2.803190	-0.000000	0.876242	0.001964	0.066541	13.0	63.157895
3	-0.708984	0.358486	0.731423	3.822343	0.000289	0.941474	0.001050	0.010554	2.0		54.934211
10	-0.567134	0.246681	0.601831	2.534346	-0.000000	0.868413	-0.000403	0.027388	13.0	52.960526
28	-0.416294	0.089774	0.338698	1.633213	-0.000000	0.736239	0.004665	0.102531	13.0	51.644737
1	-0.415185	0.090539	0.452487	1.582487	-0.000000	0.776908	0.002043	0.303336	9.0		48.190789
24	-0.421465	0.097283	0.406361	1.489784	-0.000000	0.757071	0.001480	0.042062	13.0	45.723684
16	-0.253730	0.116072	0.480775	1.612948	-0.000000	0.781546	0.001675	0.033711	8.0		39.473684
15	-0.292999	0.073890	0.377306	1.399908	-0.000000	0.726313	0.001138	0.024305	13.0	36.184211
11	0.008303	0.000000	0.086267	1.068474	-0.000000	0.461609	-0.002358	0.246608	13.0	36.184211
29	-0.323872	0.068337	0.295373	1.338361	-0.000000	0.687911	0.001587	0.033173	13.0	35.526316
17	-0.408042	0.128856	0.452487	1.590341	-0.000000	0.794611	0.000476	0.008601	6.0		32.401316
8	-0.330499	0.065021	0.303142	1.312762	-0.000000	0.700280	0.000441	0.031893	12.0	26.315789
4	-0.358560	0.078689	0.315377	1.346958	-0.000000	0.724408	0.000347	0.022946	10.0	24.671053
19	-0.077972	0.039776	0.204389	1.146121	-0.000000	0.619818	0.000093	0.002230	1.0		5.592105
9	0.012838	0.008280	0.056577	1.153098	-0.000000	0.464124	-0.000292	0.006238	3.0		4.934211
18	0.006522	0.015024	0.005734	1.066727	-0.000000	0.465287	0.000227	0.007753	4.0		4.605263
14	0.067016	0.014678	0.024338	1.063483	-0.000000	0.531610	0.000052	0.002115	0.0		0.000000

三、特征選擇方法

以上各種名額為特征選擇提供參考，下面提供特征選擇的方法，可以直接獲得較為優化的特征組合

1.前向搜尋

從第一個特征開始選取最優的特征放入特征組合清單中，然後組合第一個選取的特征和其他特征，同樣選擇評分最佳的組合，将這兩個放入特征組合清單中，以此類推，直至數量滿足要求。

0 select [22],  	the score :0.976979653545736
1 select [22, 21],  	the score :0.9868864998778198
2 select [22, 21, 26],  	the score :0.9897221832285703
3 select [22, 21, 26, 13],  	the score :0.9915044314118994
4 select [22, 21, 26, 13, 0],  	the score :0.992541012656473
5 select [22, 21, 26, 13, 0, 12],  	the score :0.993202973222626
6 select [22, 21, 26, 13, 0, 12, 29],  	the score :0.9932691107887637
7 select [22, 21, 26, 13, 0, 12, 29, 20],  	the score :0.9935280286369379
8 select [22, 21, 26, 13, 0, 12, 29, 20, 4],  	the score :0.9934640790479309
9 select [22, 21, 26, 13, 0, 12, 29, 20, 4, 24],  	the score :0.9937947668786189

【特征選擇】特征選擇名額和方法小彙總

2.遺傳算法啟發式

通過啟發式算法，在所有特征組合空間，尋找最優特征組合，實作模型評分最大值。時間消耗很大，且具有随機性，可以通過設定初始族群完成

# 使用遺傳算法進行特征選擇，疊代100次，最大特征限制為10
ga = fs.selcet_by_GA(it_num=100,max_feature=10,mutation=0.4)  #

0/100,	目前分數：0.9920092042348839
1/100,	目前分數：0.9920092042348839
2/100,	目前分數：0.9920092042348839
3/100,	目前分數：0.9920092042348839
4/100,	目前分數：0.9920278778614842
5/100,	目前分數：0.9930635275910416
6/100,	目前分數：0.9930635275910416
7/100,	目前分數：0.9930635275910416
8/100,	目前分數：0.9930635275910416
9/100,	目前分數：0.9930635275910416
10/100,	目前分數：0.9930635275910416
11/100,	目前分數：0.9930635275910416
12/100,	目前分數：0.9930635275910416
13/100,	目前分數：0.9930635275910416

【特征選擇】特征選擇名額和方法小彙總

# 遺傳算法最優的特征組合 
ga[1]

[0, 2, 13, 15, 17, 19, 21, 23, 24, 26]

遺傳算法解0.9944533479949463 ，大于前向搜尋

3.最優特征檢測

使用其他特征以此替換特征組合中的特征，如發現更優解則進行替換，如沒有則保留，傳回結果

loc0 is done 
loc1 is done 
loc2 is done 
the col 15 ,	loc3 is replaceed by 6
loc3 is done 
the col 17 ,	loc4 is replaceed by 15
the col 17 ,	loc4 is replaceed by 28
loc4 is done 
loc5 is done 
loc6 is done 
loc7 is done 
loc8 is done 
loc9 is done 
([0, 2, 13, 6, 28, 19, 21, 23, 24, 26], 0.9945843666651069)

通過最優組合檢驗之後發現更優解評分為0.9945 大于之前遺傳算法找到的最優解

四、特征選擇封裝代碼

import  pandas as pd
import  numpy as np
from sklearn.model_selection import cross_val_score

class Feature_select:
    def __init__(self, X, y,model,scoring='roc_auc'):
        '''
        X:np.array,features
        y:np.array,label
        '''
        self.X = X
        self.y = y
        self.model = model
        self.scoring = scoring
    def corr(self):
        '''
        關系數計算：輸出每一個特征對應的相關系數
        '''
        corr_ = pd.DataFrame(np.hstack([self.X, self.y.reshape(-1, 1)])).corr().iloc[:-1, -1]
        corr_ = np.array(corr_)
        return corr_

    def mic(self):
        '''
        mic系數計算：輸出每一個特征對應的mic值
        '''
        from sklearn.feature_selection import mutual_info_classif as mic
        mic_ = mic(self.X, self.y)
        return mic_

    def ks_iv(self):
        '''
        ks和iv計算函數
        '''
        iv_, ks_ = [], []
        for i in range(self.X.shape[1]):
            cuted = pd.qcut(self.X[:, i], q=10, labels=False)
            num_table = pd.crosstab(cuted, self.y)
            good_sum = num_table.sum()[0]
            bad_sum = num_table.sum()[1]
            iv = (((num_table.iloc[:, 1] + 0.5) / bad_sum) - ((num_table.iloc[:, 0] + 0.5) / good_sum) * np.log(
                ((num_table.iloc[:, 1] + 0.5) / bad_sum) / ((num_table.iloc[:, 0] + 0.5) / good_sum))).sum()
            iv_.append(iv)
            ks = max((num_table.iloc[:, 1] / bad_sum).cumsum() - (num_table.iloc[:, 0] / good_sum).cumsum())
            ks_.append(ks)
        return np.array(ks_), np.array(iv_)

    def l1_select(self, alpha=0.15):
        '''
        L1正則化 傳回為0的為被剔除變量
        alpha:懲罰系數大小
        '''
        from sklearn.linear_model import Lasso
        lasso = Lasso(alpha=alpha)  # alpha 系數越大 選出的0就越多
        lasso.fit(self.X, self.y)
        return lasso.coef_

    def learn(self):
        '''
        單個特征進入模型時候的五折交叉驗證評價名額
        model:評價模型
        scoring:評價名額
        (每個特征需運作一次模型，時間較長)
        '''
        scores = []
        for i in range(self.X.shape[1]):
            score = cross_val_score(self.model, self.X[:, i].reshape(-1, 1), self.y, scoring=self.scoring, cv=5)
            scores.append(np.mean(score))
        return np.array(scores)

    def select_from_model(self):
        '''
        sklean 中select_from_model
        model：訓練模型
            針對部分樹模型：傳回其特征重要度，
            針對線性模型：傳回其系數絕對值
        '''
        from sklearn.feature_selection import SelectFromModel
        s_model = SelectFromModel(self.model).fit(self.X, self.y)
        try:
            important_ = s_model.estimator_.feature_importances_
        except:
            important_ = abs(s_model.estimator_.coef_[0])
        return important_

    def boruta(self, it_num=40):
        '''
        參照Boruta思想，通過打亂之前特征的數值，建構shadow特征，比較原始特征和shadow特征的重要程度，評價特征的有效性
        :param it_num: int，疊代次數
        :return: 特征的Boruta得分
        '''
        diff_record = []
        for i in range(it_num):
            X_shadow = self.X.copy()
            np.random.shuffle(X_shadow)
            X_boruta = np.hstack([self.X, X_shadow])
            import_ = Feature_select(X_boruta, self.y, self.model).select_from_model()
            import_ = import_.reshape((2, -1))
            diff_ = import_[0, :] - import_[1, :]
            diff_record.append(diff_)
        bouruta_score = pd.DataFrame(np.array(diff_record).T).mean(1)
        return np.array(bouruta_score)

    def rfecv(self):
        '''
        封裝sklearn中的RFECV，将得分排名反轉，得到評分
        :return:
        '''
        from sklearn.feature_selection import RFECV
        w = RFECV(self.model, scoring=self.scoring).fit(self.X, self.y)
        return w.ranking_.max() - w.ranking_

    def search_forward(self, max_feature):
        '''
        從第一個特征開始選取最優的特征放入特征組合清單中，然後組合第一個選取的特征和其他特征，同樣選擇評分最佳的組合，将這兩個放入特征組合清單中，以此類推，直至數量滿足要求。
        model:評價模型
        max_feature:需要的最大特征數量
        scoring:選擇評分
        '''
        best_cols = []  # 存儲最優的特征組合編号
        best_scores = []  # 存儲最優特征組合下的得分
        for i in range(max_feature):
            best_score = 0
            for col in range(self.X.shape[1]):
                if col not in best_cols:
                    col_test = best_cols.copy()
                    col_test += [col]
                    if len(col_test) == 1:
                        score = cross_val_score(self.model, self.X[:, col_test].reshape(-1, 1), self.y, scoring=self.scoring).mean()
                    else:
                        score = cross_val_score(self.model, self.X[:, col_test], self.y, scoring=self.scoring).mean()
                    if score > best_score:
                        best_col = col
                        best_score = score
            best_cols += [best_col]
            best_scores.append(best_score)
            print(f'{i} select {best_cols},  \tthe score :{best_score}')
        return best_cols, best_scores

    def selcet_by_GA(self, num_=50, pop=None, it_num=50, inherit=0.8, mutation=0.2, max_feature=None,
                     want_max=True):
        '''
        使用遺傳算法選擇特征組合實作模型的評分最大
        model:評價使用的模型
        num_:遺傳算法族群數量
        pop：初始族群
        it_num：種群的數量
        inherit：種群進行雜交的機率
        mutation：種群進行變異的機率
        max_feature:最大特征數量
        scoring:模型評分
        want_max：求解最大值
        傳回值：傳入長度為特征數量的0,1向量
                曆史種群達到的最優值
        '''

        def fun_(x):
            '''傳入長度為特征數量的0,1向量，轉化為bool之後作為特征的索引，提取特征子集，使用特征子集進行選了傳回目标函數值'''
            feature_no = [i for i, j in enumerate(x) if j == 1]
            X_ = self.X[:, feature_no]
            score = cross_val_score(self.model, X_, self.y, scoring=self.scoring, cv=5).mean()
            if max_feature is not None:
                if len(feature_no) <= max_feature:
                    val = score
                else:
                    val = score / (len(feature_no) * len(feature_no))
            else:
                val = score
            return val if want_max else -val

        len_ = self.X.shape[1]

        if pop is None:
            if max_feature is not None:
                pop = np.random.choice([0, 1], (num_, len_), p=[1 - max_feature / len_, max_feature / len_])
            else:
                pop = np.random.randint(0, 2, (num_, len_))
        best_f = 0
        list_best_f = []
        for _ in range(it_num):
            scores = [fun_(i) for i in pop]
            best_fit_ = scores[np.argmax(scores)]
            if best_fit_ > best_f:
                best_f = best_fit_
                best_p = pop[np.argmax(scores)]
            list_best_f.append(best_f)
            fitness = scores - min(scores) + 0.01
            idx = np.random.choice(np.arange(num_), size=num_, replace=True,
                                   p=(fitness) / (fitness.sum()))
            pop = np.array(pop)[idx]
            new_pop = []
            for father in pop:
                child = father
                if np.random.rand() < inherit:
                    mother_id = np.random.randint(num_)
                    low_point = np.random.randint(len_)
                    high_point = np.random.randint(low_point + 1, len_ + 1)
                    child[low_point:high_point] = pop[mother_id][low_point:high_point]
                    if np.random.rand() < mutation:
                        mutate_point = np.random.randint(0, len_)
                        child[mutate_point] = 1 - child[mutate_point]
                new_pop.append(child)
            pop = new_pop
            print(f'{_}/{it_num},\t目前分數：{best_f}')
        return best_p, list_best_f

    def feature_eval(self,eval_weight = [0.1,0.1,0.1,0.1,0.1,0.2,0.2,0.2,0.2]):
        '''
        特征綜合評價函數
        :param eval_weight:list，計算評分時候各項名額的平價權重，預設為'corr','mic','ks','iv','l1','learn','bourta','important'的權重為 [0.1,0.1,0.1,0.1,0.1,0.2,0.2,0.2]
        :return: table_eval 各項名額資料的明細，和綜合得分
        '''
        corr_ = self.corr()
        mic_ = self.mic()
        ks_iv_ = self.ks_iv()
        l1_ = self.l1_select()
        learn_ = self.learn()
        bourta_ = self.boruta()
        important_ = self.select_from_model()
        rfecv_ = self.rfecv()
        table_eval = pd.DataFrame(np.array([corr_,mic_,ks_iv_[0],ks_iv_[1],l1_,learn_,bourta_,important_,rfecv_]).T,columns=['corr','mic','ks','iv','l1','learn','bourta','important','rfecv'])
        eval_ = (table_eval.abs().rank() * np.array(eval_weight) / np.array(eval_weight).sum()).sum(1)
        score = 100*(eval_ - eval_.min())/(eval_.max() - eval_.min())
        table_eval['score'] = score
        return  table_eval.sort_values(by = 'score',ascending=False)

    def check_best(self,feature_no):
        '''
        使用其他特征以此替換特征組合中的特征，如發現更優解則進行替換，如沒有則保留，傳回結果
        :param feature_no:待檢驗特征
        :return: 檢驗之後的特征，最新得分 
        '''
        best_score = cross_val_score(self.model, self.X[:,feature_no], self.y, scoring=self.scoring, cv=5).mean()
        for i,no in enumerate(feature_no):
            feature_no_t = feature_no.copy()
            for j in range(self.X.shape[1]):
                if j not in feature_no:
                    feature_no_t[i] = j
                    score = cross_val_score(self.model, self.X[:,feature_no_t], self.y, scoring=self.scoring, cv=5).mean()
                    if score > best_score:
                        best_score = score
                        feature_no[i]  = j
                        print(f'the col {no} ,\tloc{i} is replaceed by {j}')
            print(f'loc{i} is done ')
        return  feature_no,best_score

【特征選擇】特征選擇名額和方法小彙總

一、簡介

二、特征選擇名額

1.皮爾森相關系數

2.互資訊

3.KS和IV值

4.L1正則化

5.基于學習模型的特征評分

6.特征重要程度或系數大小（select_from_model）

7.boruta特征選擇

8.遞歸特征消除

9.全部名額計算和綜合評價

三、特征選擇方法

1.前向搜尋

2.遺傳算法啟發式

3.最優特征檢測

四、特征選擇封裝代碼

繼續閱讀

YAML簡介和PyYAML安全操作YAML支援的類型YAML的優點：yaml的基本文法python操作

2021-2025年中國運動療法（KT）帶行業市場供需與戰略研究報告

Small tricks

libsvm for python 安裝

2021年危險化學品經營機關安全管理人員考試題庫及危險化學品經營機關安全管理人員考試技巧

學習軟體測試基礎測試第七天

Zeppelin 配置通路 REST APIApache Zeppelin Configuration REST API

【Torch】最簡潔logging使用指南

27. Remove Element(清單)題目代碼

無人機--飛控科普

Cloud Studio初體驗

使用 ctypes 進行 Python 和 C 的混合程式設計

【python】【資料處理】畫多元資料分布圖

【python】netconf協定對接管理裝置

「Python 網絡自動化」NETCONF —— Python 使用 NETCONF 管理配置 H3C 網絡裝置

在python中建立excel并寫入