天天看點

【特征選擇】特征選擇名額和方法小彙總

一、簡介

1、對特征選擇的名額提供計算方法和代碼,包括有:相關系數、互資訊、KS、IV、L1正則化、單特征模型評分、特征重要度或系數大小、boruta特征評價、遞歸特征消除排序。

2、提供特征選擇的方法和代碼:前向搜尋法、遺傳算法啟發式搜尋法,最佳特征檢測法,

# 本次項目使用的資料為以下資料,
from sklearn.datasets import load_breast_cancer
data = load_breast_cancer()
data['data'].shape
X,y= data['data'],data.target

# 模型使用的邏輯回歸,評價名額為auc值,使用代碼為封裝的Feature_select類,代碼附後
from tools.feature_select import Feature_select 
fs = Feature_select(X,y,lr)  
           

二、特征選擇名額

1.皮爾森相關系數

Pearson相關系數反應了特征和标簽之間的線性關系,其一個明顯缺陷是,作為特征排序機制,他隻對線性關系敏感。如果關系是非線性的,即便兩個變量具有一一對應的關系,Pearson相關性也可能會接近0。

array([-0.73002851, -0.4151853 , -0.74263553, -0.70898384, -0.35855997,
       -0.59653368, -0.69635971, -0.77661384, -0.33049855,  0.0128376 ,
       -0.56713382,  0.00830333, -0.5561407 , -0.54823594,  0.06701601,
       -0.29299924, -0.25372977, -0.40804233,  0.00652176, -0.07797242,
       -0.77645378, -0.45690282, -0.78291414, -0.73382503, -0.42146486,
       -0.59099824, -0.65961021, -0.79356602, -0.41629431, -0.32387219])
           

2.互資訊

相比皮爾森相關系數 ,互資訊能夠一定程度反應特征和标簽之間的非線性關系。

array([0.36671567, 0.09648092, 0.40428514, 0.35976973, 0.08173721,
       0.21061423, 0.37479754, 0.44305847, 0.06671437, 0.01354967,
       0.24521911, 0.00110187, 0.278233  , 0.33920662, 0.01602095,
       0.07457715, 0.11807899, 0.12371226, 0.013367  , 0.04097258,
       0.44985384, 0.12364186, 0.47728208, 0.46426447, 0.10166035,
       0.22460362, 0.31435137, 0.43651194, 0.09544855, 0.06837461])
           

3.KS和IV值

KS和IV值作為風控中經常使用的兩個名額,IV值更加注重特征對标簽的區分能力,KS值注重模型對于标簽的區分

(array([0.72390466, 0.45248666, 0.74467523, 0.73142276, 0.31537709,
        0.57175889, 0.75971143, 0.81985624, 0.30314201, 0.05657735,
        0.6018313 , 0.08626658, 0.5867951 , 0.70708472, 0.02433804,
        0.37730564, 0.48077533, 0.45248666, 0.00573437, 0.2043893 ,
        0.79730194, 0.44318482, 0.82737435, 0.80482004, 0.40636066,
        0.54920459, 0.70058401, 0.80290418, 0.33869774, 0.29537287]),
 array([3.78395724, 1.5824873 , 3.82087721, 3.82234285, 1.34695804,
        2.10298612, 3.00830869, 3.9745556 , 1.31276152, 1.15309762,
        2.53434592, 1.06847433, 2.80319049, 3.65151974, 1.06348267,
        1.39990837, 1.61294779, 1.59034111, 1.06672724, 1.14612131,
        4.44837968, 1.54256612, 4.58499352, 4.39469498, 1.48978388,
        2.19735313, 2.57167549, 4.37557075, 1.63321251, 1.33836142]))
           

4.L1正則化

L1正則化将系數w的l1範數作為懲罰項加到損失函數上,由于正則項非零,這就迫使那些弱的特征所對應的系數變成0。是以L1正則化往往會使學到的模型很稀疏(系數w經常為0),這個特性使得L1正則化成為一種很好的特征選擇方法。

這裡采用sklearn總lasso模型進行封裝

array([ 0.        ,  0.00202185,  0.        ,  0.00043358, -0.        ,
       -0.        , -0.        , -0.        , -0.        , -0.        ,
       -0.        , -0.        , -0.        , -0.00125443, -0.        ,
       -0.        , -0.        , -0.        , -0.        , -0.        ,
       -0.08225553, -0.01460497, -0.01379535,  0.0007448 , -0.        ,
       -0.        , -0.06013473, -0.        , -0.        , -0.        ])
           

5.基于學習模型的特征評分

這種方法的思路是直接使用你要用的機器學習算法,針對每個單獨的特征和響應變量建立預測模型,得出模型的評分。

其實Pearson相關系數等價于線性回歸裡的标準化回歸系數。假如某個特征和響應變量之間的關系是非線性的,可以用基于樹的方法(決策樹、随機森林)、或者擴充的線性模型等。基于樹的方法比較易于使用,因為他們對非線性關系的模組化比較好,并且不需要太多的調試。但要注意過拟合問題,是以樹的深度最好不要太大,再就是運用交叉驗證。

array([0.94065577, 0.77690815, 0.94993828, 0.9414743 , 0.72440823,
       0.86654102, 0.93679947, 0.96337183, 0.70027988, 0.46412392,
       0.86841319, 0.46160936, 0.87624184, 0.92835235, 0.5316104 ,
       0.72631256, 0.7815463 , 0.79461106, 0.46528705, 0.6198184 ,
       0.97066582, 0.78313947, 0.97697965, 0.97057468, 0.75707131,
       0.86194584, 0.92023563, 0.96808801, 0.73623915, 0.68791135])
           

6.特征重要程度或系數大小(select_from_model)

封裝sklearn中Select_from_model,不過這個方法隻能輸出選擇之後的特征,不能輸出具體的評價分數,分析select_from_model代碼,針對随機森林等模型,則輸入其feature_important_大小。如回歸模型,則輸出其系數的大小。

array([0.66640051, 0.30333585, 0.36057931, 0.01055394, 0.02294638,
       0.11137151, 0.15632322, 0.06555693, 0.03189341, 0.00623842,
       0.02738789, 0.24660799, 0.06654065, 0.12058007, 0.00211526,
       0.02430511, 0.03371127, 0.00860147, 0.00775326, 0.00222996,
       0.7054829 , 0.37988099, 0.24474795, 0.01677032, 0.04206173,
       0.35002252, 0.43570279, 0.12673618, 0.10253126, 0.03317253])
           

7.boruta特征選擇

boruta的思路是針對某一特征,将其值進行随機配置設定,形成該特征的shadow特征,通過比較原始特征和shadow特征的重要對差異,判斷該特征的有效性。這一方法的優點是,形成了對照組,提出了部分不能觀察到的因素造成對特征評分的誤判。

[ 1.93884227e-02  2.55134195e-02 -2.22229096e-02  9.71757303e-04
  3.42173146e-04  4.67893675e-03  6.91337507e-03  2.69658857e-03
  4.77232016e-04 -2.64284804e-04 -2.54539319e-04 -2.34624522e-03
  3.69165467e-03  3.24437344e-02  3.30642361e-05  1.05415307e-03
  1.47101778e-03  3.76387825e-04  1.92383814e-04  8.18088312e-05
  2.77834702e-02  1.12243313e-01 -1.26686099e-02  2.19071929e-02
  1.22108470e-03  1.62224294e-02  2.05164881e-02  5.65378210e-03
  4.06231930e-03  1.34338174e-03]
           

8.遞歸特征消除

遞歸特征消除的主要思想是反複的構模組化型(如SVM或者回歸模型)然後選出最好的(或者最差的)的特征(可以根據系數來選),把選出來的特征放到一遍,然後在剩餘的特征上重複這個過程,直到所有特征都周遊了。這個過程中特征被消除的次序就是特征的排序。是以,這是一種尋找最優特征子集的貪心算法。
Sklearn提供了RFE包,可以用于特征消除,還提供了RFECV,可以通過交叉驗證來對的特征進行排序。這裡 封裝sklearn中的RFECV,将得分排名反轉,得到評分。
           
[13  9  5  2 10 13 13 13 12  3 13 13 13 13  0 13  8  6  4  1 11 13 13  7
 13 13 13 13 13 13]
           

9.全部名額計算和綜合評價

# 計算以上全部名額,并完成歸一化之後,按照配置設定的權重計算綜合得分(score),并按照得分進行排序
fs.feature_eval(eval_weight=[0.1, 0.1, 0.1, 0.1, 0.1, 0.2, 0.2, 0.2, 0.2]) 
           
corr		mic      	ks      	iv			l1			learn		bourta		important	rfecv		score
22	-0.782914	0.474132	0.827374	4.584994	-0.018573	0.976980	0.009898	0.244748	13.0	100.000000
20	-0.776454	0.453242	0.797302	4.448380	-0.000000	0.970666	0.034960	0.705483	11.0	94.078947
0	-0.730029	0.365405	0.723905	3.783957	-0.000000	0.940656	0.024852	0.666401	13.0	88.815789
27	-0.793566	0.436879	0.802904	4.375571	-0.000000	0.968088	0.006664	0.126736	13.0	86.842105
13	-0.548236	0.340502	0.707085	3.651520	-0.000056	0.928352	0.032520	0.120580	13.0	81.907895
26	-0.659610	0.317450	0.700584	2.571675	-0.000000	0.920236	0.023289	0.435703	13.0	81.578947
7	-0.776614	0.439923	0.819856	3.974556	-0.000000	0.963372	0.003340	0.065557	13.0	80.921053
6	-0.696360	0.373305	0.759711	3.008309	-0.000000	0.936799	0.008183	0.156323	13.0	78.947368
2	-0.742636	0.401514	0.744675	3.820877	-0.000000	0.949938	-0.021761	0.360579	5.0		78.618421
23	-0.733825	0.464017	0.804820	4.394695	0.000316	0.970575	0.021532	0.016770	7.0		77.302632
21	-0.456903	0.117602	0.443185	1.542566	-0.009721	0.783139	0.118309	0.379881	13.0	75.657895
25	-0.590998	0.226403	0.549205	2.197353	-0.000000	0.861946	0.018325	0.350023	13.0	71.381579
5	-0.596534	0.212811	0.571759	2.102986	-0.000000	0.866541	0.005434	0.111372	13.0	64.802632
12	-0.556141	0.276573	0.586795	2.803190	-0.000000	0.876242	0.001964	0.066541	13.0	63.157895
3	-0.708984	0.358486	0.731423	3.822343	0.000289	0.941474	0.001050	0.010554	2.0		54.934211
10	-0.567134	0.246681	0.601831	2.534346	-0.000000	0.868413	-0.000403	0.027388	13.0	52.960526
28	-0.416294	0.089774	0.338698	1.633213	-0.000000	0.736239	0.004665	0.102531	13.0	51.644737
1	-0.415185	0.090539	0.452487	1.582487	-0.000000	0.776908	0.002043	0.303336	9.0		48.190789
24	-0.421465	0.097283	0.406361	1.489784	-0.000000	0.757071	0.001480	0.042062	13.0	45.723684
16	-0.253730	0.116072	0.480775	1.612948	-0.000000	0.781546	0.001675	0.033711	8.0		39.473684
15	-0.292999	0.073890	0.377306	1.399908	-0.000000	0.726313	0.001138	0.024305	13.0	36.184211
11	0.008303	0.000000	0.086267	1.068474	-0.000000	0.461609	-0.002358	0.246608	13.0	36.184211
29	-0.323872	0.068337	0.295373	1.338361	-0.000000	0.687911	0.001587	0.033173	13.0	35.526316
17	-0.408042	0.128856	0.452487	1.590341	-0.000000	0.794611	0.000476	0.008601	6.0		32.401316
8	-0.330499	0.065021	0.303142	1.312762	-0.000000	0.700280	0.000441	0.031893	12.0	26.315789
4	-0.358560	0.078689	0.315377	1.346958	-0.000000	0.724408	0.000347	0.022946	10.0	24.671053
19	-0.077972	0.039776	0.204389	1.146121	-0.000000	0.619818	0.000093	0.002230	1.0		5.592105
9	0.012838	0.008280	0.056577	1.153098	-0.000000	0.464124	-0.000292	0.006238	3.0		4.934211
18	0.006522	0.015024	0.005734	1.066727	-0.000000	0.465287	0.000227	0.007753	4.0		4.605263
14	0.067016	0.014678	0.024338	1.063483	-0.000000	0.531610	0.000052	0.002115	0.0		0.000000
           

三、特征選擇方法

以上各種名額為特征選擇提供參考,下面提供特征選擇的方法,可以直接獲得較為優化的特征組合

1.前向搜尋

從第一個特征開始選取最優的特征放入特征組合清單中,然後組合第一個選取的特征和其他特征,同樣選擇評分最佳的組合,将這兩個放入特征組合清單中,以此類推,直至數量滿足要求。

0 select [22],  	the score :0.976979653545736
1 select [22, 21],  	the score :0.9868864998778198
2 select [22, 21, 26],  	the score :0.9897221832285703
3 select [22, 21, 26, 13],  	the score :0.9915044314118994
4 select [22, 21, 26, 13, 0],  	the score :0.992541012656473
5 select [22, 21, 26, 13, 0, 12],  	the score :0.993202973222626
6 select [22, 21, 26, 13, 0, 12, 29],  	the score :0.9932691107887637
7 select [22, 21, 26, 13, 0, 12, 29, 20],  	the score :0.9935280286369379
8 select [22, 21, 26, 13, 0, 12, 29, 20, 4],  	the score :0.9934640790479309
9 select [22, 21, 26, 13, 0, 12, 29, 20, 4, 24],  	the score :0.9937947668786189
           
【特征選擇】特征選擇名額和方法小彙總

2.遺傳算法啟發式

通過啟發式算法,在所有特征組合空間,尋找最優特征組合,實作模型評分最大值。時間消耗很大,且具有随機性,可以通過設定初始族群完成

# 使用遺傳算法進行特征選擇,疊代100次,最大特征限制為10
ga = fs.selcet_by_GA(it_num=100,max_feature=10,mutation=0.4)  # 
           
0/100,	目前分數:0.9920092042348839
1/100,	目前分數:0.9920092042348839
2/100,	目前分數:0.9920092042348839
3/100,	目前分數:0.9920092042348839
4/100,	目前分數:0.9920278778614842
5/100,	目前分數:0.9930635275910416
6/100,	目前分數:0.9930635275910416
7/100,	目前分數:0.9930635275910416
8/100,	目前分數:0.9930635275910416
9/100,	目前分數:0.9930635275910416
10/100,	目前分數:0.9930635275910416
11/100,	目前分數:0.9930635275910416
12/100,	目前分數:0.9930635275910416
13/100,	目前分數:0.9930635275910416
           
【特征選擇】特征選擇名額和方法小彙總
# 遺傳算法最優的特征組合 
ga[1]
           
[0, 2, 13, 15, 17, 19, 21, 23, 24, 26]	
           

遺傳算法解0.9944533479949463 ,大于前向搜尋

3.最優特征檢測

使用其他特征以此替換特征組合中的特征,如發現更優解則進行替換,如沒有則保留,傳回結果

loc0 is done 
loc1 is done 
loc2 is done 
the col 15 ,	loc3 is replaceed by 6
loc3 is done 
the col 17 ,	loc4 is replaceed by 15
the col 17 ,	loc4 is replaceed by 28
loc4 is done 
loc5 is done 
loc6 is done 
loc7 is done 
loc8 is done 
loc9 is done 
([0, 2, 13, 6, 28, 19, 21, 23, 24, 26], 0.9945843666651069)
           

通過最優組合檢驗之後 發現更優解評分為0.9945 大于之前遺傳算法找到的最優解

四、特征選擇封裝代碼

import  pandas as pd
import  numpy as np
from sklearn.model_selection import cross_val_score

class Feature_select:
    def __init__(self, X, y,model,scoring='roc_auc'):
        '''
        X:np.array,features
        y:np.array,label
        '''
        self.X = X
        self.y = y
        self.model = model
        self.scoring = scoring
    def corr(self):
        '''
        關系數計算:輸出每一個特征對應的相關系數
        '''
        corr_ = pd.DataFrame(np.hstack([self.X, self.y.reshape(-1, 1)])).corr().iloc[:-1, -1]
        corr_ = np.array(corr_)
        return corr_

    def mic(self):
        '''
        mic系數計算:輸出每一個特征對應的mic值
        '''
        from sklearn.feature_selection import mutual_info_classif as mic
        mic_ = mic(self.X, self.y)
        return mic_

    def ks_iv(self):
        '''
        ks和iv計算函數
        '''
        iv_, ks_ = [], []
        for i in range(self.X.shape[1]):
            cuted = pd.qcut(self.X[:, i], q=10, labels=False)
            num_table = pd.crosstab(cuted, self.y)
            good_sum = num_table.sum()[0]
            bad_sum = num_table.sum()[1]
            iv = (((num_table.iloc[:, 1] + 0.5) / bad_sum) - ((num_table.iloc[:, 0] + 0.5) / good_sum) * np.log(
                ((num_table.iloc[:, 1] + 0.5) / bad_sum) / ((num_table.iloc[:, 0] + 0.5) / good_sum))).sum()
            iv_.append(iv)
            ks = max((num_table.iloc[:, 1] / bad_sum).cumsum() - (num_table.iloc[:, 0] / good_sum).cumsum())
            ks_.append(ks)
        return np.array(ks_), np.array(iv_)

    def l1_select(self, alpha=0.15):
        '''
        L1正則化 傳回為0的為被剔除變量
        alpha:懲罰系數大小
        '''
        from sklearn.linear_model import Lasso
        lasso = Lasso(alpha=alpha)  # alpha 系數越大 選出的0就越多
        lasso.fit(self.X, self.y)
        return lasso.coef_

    def learn(self):
        '''
        單個特征進入模型時候的五折交叉驗證評價名額
        model:評價模型
        scoring:評價名額
        (每個特征需運作一次模型,時間較長)
        '''
        scores = []
        for i in range(self.X.shape[1]):
            score = cross_val_score(self.model, self.X[:, i].reshape(-1, 1), self.y, scoring=self.scoring, cv=5)
            scores.append(np.mean(score))
        return np.array(scores)

    def select_from_model(self):
        '''
        sklean 中select_from_model
        model:訓練模型
            針對部分樹模型:傳回其特征重要度,
            針對線性模型:傳回其系數絕對值
        '''
        from sklearn.feature_selection import SelectFromModel
        s_model = SelectFromModel(self.model).fit(self.X, self.y)
        try:
            important_ = s_model.estimator_.feature_importances_
        except:
            important_ = abs(s_model.estimator_.coef_[0])
        return important_

    def boruta(self, it_num=40):
        '''
        參照Boruta思想,通過打亂之前特征的數值,建構shadow特征,比較原始特征和shadow特征的重要程度,評價特征的有效性
        :param it_num: int,疊代次數
        :return: 特征的Boruta得分
        '''
        diff_record = []
        for i in range(it_num):
            X_shadow = self.X.copy()
            np.random.shuffle(X_shadow)
            X_boruta = np.hstack([self.X, X_shadow])
            import_ = Feature_select(X_boruta, self.y, self.model).select_from_model()
            import_ = import_.reshape((2, -1))
            diff_ = import_[0, :] - import_[1, :]
            diff_record.append(diff_)
        bouruta_score = pd.DataFrame(np.array(diff_record).T).mean(1)
        return np.array(bouruta_score)

    def rfecv(self):
        '''
        封裝sklearn中的RFECV,将得分排名反轉,得到評分
        :return:
        '''
        from sklearn.feature_selection import RFECV
        w = RFECV(self.model, scoring=self.scoring).fit(self.X, self.y)
        return w.ranking_.max() - w.ranking_

    def search_forward(self, max_feature):
        '''
        從第一個特征開始選取最優的特征放入特征組合清單中,然後組合第一個選取的特征和其他特征,同樣選擇評分最佳的組合,将這兩個放入特征組合清單中,以此類推,直至數量滿足要求。
        model:評價模型
        max_feature:需要的最大特征數量
        scoring:選擇評分
        '''
        best_cols = []  # 存儲最優的特征組合編号
        best_scores = []  # 存儲最優特征組合下的得分
        for i in range(max_feature):
            best_score = 0
            for col in range(self.X.shape[1]):
                if col not in best_cols:
                    col_test = best_cols.copy()
                    col_test += [col]
                    if len(col_test) == 1:
                        score = cross_val_score(self.model, self.X[:, col_test].reshape(-1, 1), self.y, scoring=self.scoring).mean()
                    else:
                        score = cross_val_score(self.model, self.X[:, col_test], self.y, scoring=self.scoring).mean()
                    if score > best_score:
                        best_col = col
                        best_score = score
            best_cols += [best_col]
            best_scores.append(best_score)
            print(f'{i} select {best_cols},  \tthe score :{best_score}')
        return best_cols, best_scores

    def selcet_by_GA(self, num_=50, pop=None, it_num=50, inherit=0.8, mutation=0.2, max_feature=None,
                     want_max=True):
        '''
        使用遺傳算法選擇特征組合實作模型的評分最大
        model:評價使用的模型
        num_:遺傳算法族群數量
        pop:初始族群
        it_num:種群的數量
        inherit:種群進行雜交的機率
        mutation:種群進行變異的機率
        max_feature:最大特征數量
        scoring:模型評分
        want_max:求解最大值
        傳回值:傳入長度為特征數量的0,1向量
                曆史種群達到的最優值
        '''

        def fun_(x):
            '''傳入長度為特征數量的0,1向量,轉化為bool之後作為特征的索引,提取特征子集,使用特征子集進行選了傳回目标函數值'''
            feature_no = [i for i, j in enumerate(x) if j == 1]
            X_ = self.X[:, feature_no]
            score = cross_val_score(self.model, X_, self.y, scoring=self.scoring, cv=5).mean()
            if max_feature is not None:
                if len(feature_no) <= max_feature:
                    val = score
                else:
                    val = score / (len(feature_no) * len(feature_no))
            else:
                val = score
            return val if want_max else -val

        len_ = self.X.shape[1]

        if pop is None:
            if max_feature is not None:
                pop = np.random.choice([0, 1], (num_, len_), p=[1 - max_feature / len_, max_feature / len_])
            else:
                pop = np.random.randint(0, 2, (num_, len_))
        best_f = 0
        list_best_f = []
        for _ in range(it_num):
            scores = [fun_(i) for i in pop]
            best_fit_ = scores[np.argmax(scores)]
            if best_fit_ > best_f:
                best_f = best_fit_
                best_p = pop[np.argmax(scores)]
            list_best_f.append(best_f)
            fitness = scores - min(scores) + 0.01
            idx = np.random.choice(np.arange(num_), size=num_, replace=True,
                                   p=(fitness) / (fitness.sum()))
            pop = np.array(pop)[idx]
            new_pop = []
            for father in pop:
                child = father
                if np.random.rand() < inherit:
                    mother_id = np.random.randint(num_)
                    low_point = np.random.randint(len_)
                    high_point = np.random.randint(low_point + 1, len_ + 1)
                    child[low_point:high_point] = pop[mother_id][low_point:high_point]
                    if np.random.rand() < mutation:
                        mutate_point = np.random.randint(0, len_)
                        child[mutate_point] = 1 - child[mutate_point]
                new_pop.append(child)
            pop = new_pop
            print(f'{_}/{it_num},\t目前分數:{best_f}')
        return best_p, list_best_f

    def feature_eval(self,eval_weight = [0.1,0.1,0.1,0.1,0.1,0.2,0.2,0.2,0.2]):
        '''
        特征綜合評價函數
        :param eval_weight:list,計算評分時候各項名額的平價權重,預設為'corr','mic','ks','iv','l1','learn','bourta','important'的權重為 [0.1,0.1,0.1,0.1,0.1,0.2,0.2,0.2]
        :return: table_eval 各項名額資料的明細,和綜合得分
        '''
        corr_ = self.corr()
        mic_ = self.mic()
        ks_iv_ = self.ks_iv()
        l1_ = self.l1_select()
        learn_ = self.learn()
        bourta_ = self.boruta()
        important_ = self.select_from_model()
        rfecv_ = self.rfecv()
        table_eval = pd.DataFrame(np.array([corr_,mic_,ks_iv_[0],ks_iv_[1],l1_,learn_,bourta_,important_,rfecv_]).T,columns=['corr','mic','ks','iv','l1','learn','bourta','important','rfecv'])
        eval_ = (table_eval.abs().rank() * np.array(eval_weight) / np.array(eval_weight).sum()).sum(1)
        score = 100*(eval_ - eval_.min())/(eval_.max() - eval_.min())
        table_eval['score'] = score
        return  table_eval.sort_values(by = 'score',ascending=False)

    def check_best(self,feature_no):
        '''
        使用其他特征以此替換特征組合中的特征,如發現更優解則進行替換,如沒有則保留,傳回結果
        :param feature_no:待檢驗特征
        :return: 檢驗之後的特征,最新得分 
        '''
        best_score = cross_val_score(self.model, self.X[:,feature_no], self.y, scoring=self.scoring, cv=5).mean()
        for i,no in enumerate(feature_no):
            feature_no_t = feature_no.copy()
            for j in range(self.X.shape[1]):
                if j not in feature_no:
                    feature_no_t[i] = j
                    score = cross_val_score(self.model, self.X[:,feature_no_t], self.y, scoring=self.scoring, cv=5).mean()
                    if score > best_score:
                        best_score = score
                        feature_no[i]  = j
                        print(f'the col {no} ,\tloc{i} is replaceed by {j}')
            print(f'loc{i} is done ')
        return  feature_no,best_score