天天看點

Python實作遺傳算法(GA)+支援向量回歸機(SVR)

本實驗使用環境為Anaconda3 Jupyter,調用Sklearn包,請提前準備好。

1.引入一些常見包

主要包含pandas、numpy、繪圖包、SVR、标準化、驗證函數等包。

from sklearn.metrics import accuracy_score
from sklearn.metrics import confusion_matrix
from sklearn.metrics import classification_report
from sklearn.metrics import explained_variance_score
from sklearn.preprocessing import StandardScaler
from sklearn.model_selection import train_test_split
import pandas as pd
import numpy as np
import time
from sklearn import metrics
import csv
from sklearn.svm import SVR
import matplotlib.pyplot as plt  
           

2.引入資料

将準備好的CSV檔案導入,将資料分為目标集和特征集,比如本文預測’土壤水分’,使用’土壤溫度’,‘空氣濕度’,‘空氣溫度’,'光照強度’作為特征。使用pands進行格式化顯示。

#全部資料
data=[]
#特征集
traffic_feature=[]
#目标集
traffic_target=[]
#打開資料檔案
csv_file = csv.reader(open('turang.csv'))
#周遊 格式設定為float
for content in csv_file:
    content=list(map(float,content))
    if len(content)!=0:
        data.append(content)
        traffic_feature.append(content[0:4])
        traffic_target.append(content[-1])
#将資料變成np.array的格式
data=np.array(data)
traffic_feature=np.array(traffic_feature)
traffic_target=np.array(traffic_target)
#更直覺的觀察資料,本文中沒有太多用途
df=pd.DataFrame(data=data,columns = ['土壤溫度','空氣濕度','空氣溫度','光照強度','土壤水分'])
           

資料最終樣子 輸入df即可檢視

Python實作遺傳算法(GA)+支援向量回歸機(SVR)

目标值圖:

(噪音有點大…先将就用吧 - -!!,以後我會寫一篇資料降噪的文章)

plt.plot(traffic_target)#測試數組
fig = plt.gcf()
fig.set_size_inches(18.5, 10.5)
plt.show()
           
Python實作遺傳算法(GA)+支援向量回歸機(SVR)

3.資料标準化

使用StandardScaler()方法将資料标準化歸一化。

scaler = StandardScaler() # 标準化轉換
scaler.fit(traffic_feature)  # 訓練标準化對象
traffic_feature= scaler.transform(traffic_feature)   # 轉換資料集
           

标準化前:

Python實作遺傳算法(GA)+支援向量回歸機(SVR)

标準化後:

Python實作遺傳算法(GA)+支援向量回歸機(SVR)

4.使用支援向量回歸機(SVR)

先将資料随機90%做測試集,剩下10%當驗證集,随機種子任意設定。本文采取随機抽取形式,而不是時間序列。如使用時間序列預測,請将特征集和目标集的後10%留出來,做為驗證集。同理如單步時間序列與從,則保留最後一行作為驗證集。

如:

feature_test=traffic_feature[int(len(traffic_feature)*0.9):int(len(traffic_feature))]
target_test=traffic_target[int(len(traffic_target)*0.9):int(len(traffic_target))]
           

使用SVR(),參數預設,使用R方、EVS、時間作為評價名額。

feature_train,feature_test,target_train, target_test = train_test_split(traffic_feature,traffic_target,test_size=0.1,random_state=10)

start1=time.time()
model_svr = SVR()
model_svr.fit(feature_train,target_train)
predict_results1=model_svr.predict(feature_test)
end1=time.time()

plt.plot(target_test)#測試數組
plt.plot(predict_results1)#測試數組
plt.legend(['True','SVR'])
fig = plt.gcf()
fig.set_size_inches(18.5, 10.5)
plt.title("SVR")  # 标題
plt.show()
print("EVS:",explained_variance_score(target_test,predict_results1))
print("R2:",metrics.r2_score(target_test,predict_results1))
print("Time:",end1-start1)
           

結果:

看的出來,随機抽取的精度不高,R方隻有區區0.54。(廢話,噪音那麼大,精度能高嗎…)

Python實作遺傳算法(GA)+支援向量回歸機(SVR)

時間序列預測結果更離譜,我就不放圖了。

5.使用GA算法對SVR進行調參

對SVR參數的懲罰參數C、損失函數epsilon、核系數gamma進行調參,設定其範圍為[0,10]、[0,2]、[0,100],可以自行設定。一共20代,每代10人,可以自行設定。

GA算法簡單來說,就是每一代人都會有人基因突變,基因突變如果效果很好,适應度很高,那麼其他人也會向其進化。

上代碼:

#設定适應度,這裡設定為R2
def msefunc(predictval,realval):
    print("R2 = ",metrics.r2_score(realval,predictval)) # R2
    return metrics.r2_score(realval,predictval)
 #設定優化函數,這裡為SVR,參數在此綁定,使用驗證集輸入驗證得出适應度
def SVMResult(vardim, x, bound):
    X = feature_train.tolist()
    y = target_train.tolist()
    c=x[0]
    e=x[1]
    g=x[2]
    clf = SVR(C=c,epsilon=e,gamma=g)
    clf.fit(X, y)
    predictval=clf.predict(feature_test.tolist())
    return msefunc(predictval,target_test.tolist())
class GAIndividual:
 
    '''
    individual of genetic algorithm
    '''
 
    def __init__(self,  vardim, bound):
        '''
        vardim: dimension of variables
        bound: boundaries of variables
        '''
        self.vardim = vardim
        self.bound = bound
        self.fitness = 0.
 
    def generate(self):
        '''
        generate a random chromsome for genetic algorithm
        '''
        len = self.vardim
        rnd = np.random.random(size=len)
        self.chrom = np.zeros(len)
        for i in range(0, len):
            self.chrom[i] = self.bound[0, i] + \
                (self.bound[1, i] - self.bound[0, i]) * rnd[i]
 
    def calculateFitness(self):
        '''
        calculate the fitness of the chromsome
        '''
        self.fitness = SVMResult(self.vardim, self.chrom, self.bound)
        
import random
import copy

 
class GeneticAlgorithm:
 
    '''
    The class for genetic algorithm
    '''
 
    def __init__(self, sizepop, vardim, bound, MAXGEN, params):
        '''
        sizepop: population sizepop人口規模
        vardim: dimension of variables變量維數
        bound: boundaries of variables變量邊界
        MAXGEN: termination condition終止條件
        param: algorithm required parameters, it is a list which is consisting of crossover rate, mutation rate, alpha
        '''
        self.sizepop = sizepop
        self.MAXGEN = MAXGEN
        self.vardim = vardim
        self.bound = bound
        self.population = []
        self.fitness = np.zeros((self.sizepop, 1))
        self.trace = np.zeros((self.MAXGEN, 3))
        self.params = params
 
    def initialize(self):
        '''
        initialize the population
        '''
        for i in range(0, self.sizepop):
            ind = GAIndividual(self.vardim, self.bound)
            ind.generate()
            self.population.append(ind)
 
    def evaluate(self):
        '''
        evaluation of the population fitnesses
        '''
        for i in range(0, self.sizepop):
            self.population[i].calculateFitness()
            self.fitness[i] = self.population[i].fitness
 
    def solve(self):
        '''
        evolution process of genetic algorithm
        '''
        self.t = 0
        self.initialize()
        self.evaluate()
        best = np.max(self.fitness)
        bestIndex = np.argmax(self.fitness)
        self.best = copy.deepcopy(self.population[bestIndex])
        self.avefitness = np.mean(self.fitness)
        self.maxfitness = np.max(self.fitness)
        
        self.trace[self.t, 0] =  self.best.fitness
        self.trace[self.t, 1] =  self.avefitness
        self.trace[self.t, 2] =  self.maxfitness
        print("Generation %d: optimal function value is: %f; average function value is %f;max function value is %f"% (
            self.t, self.trace[self.t, 0], self.trace[self.t, 1],self.trace[self.t, 2]))
        while (self.t < self.MAXGEN - 1):
            self.t += 1
            self.selectionOperation()
            self.crossoverOperation()
            self.mutationOperation()
            self.evaluate()
            best = np.max(self.fitness)
            bestIndex = np.argmax(self.fitness)
            if best > self.best.fitness:
                self.best = copy.deepcopy(self.population[bestIndex])
            self.avefitness = np.mean(self.fitness)
            self.maxfitness = np.max(self.fitness)
            
            self.trace[self.t, 0] =  self.best.fitness
            self.trace[self.t, 1] = self.avefitness
            self.trace[self.t, 2] =  self.maxfitness
            print("Generation %d: optimal function value is: %f; average function value is %f;max function value is %f"% (
            self.t, self.trace[self.t, 0], self.trace[self.t, 1],self.trace[self.t, 2]))
 
        print("Optimal function value is: %f; " %
              self.trace[self.t, 0])
        print ("Optimal solution is:")
        print (self.best.chrom)
        self.printResult()
 
    def selectionOperation(self):
        '''
        selection operation for Genetic Algorithm
        '''
        newpop = []
        totalFitness = np.sum(self.fitness)
        accuFitness = np.zeros((self.sizepop, 1))
 
        sum1 = 0.
        for i in range(0, self.sizepop):
            accuFitness[i] = sum1 + self.fitness[i] / totalFitness
            sum1 = accuFitness[i]
 
        for i in range(0, self.sizepop):
            r = random.random()
            idx = 0
            for j in range(0, self.sizepop - 1):
                if j == 0 and r < accuFitness[j]:
                    idx = 0
                    break
                elif r >= accuFitness[j] and r < accuFitness[j + 1]:
                    idx = j + 1
                    break
            newpop.append(self.population[idx])
        self.population = newpop
 
    def crossoverOperation(self):
        '''
        crossover operation for genetic algorithm
        '''
        newpop = []
        for i in range(0, self.sizepop, 2):
            idx1 = random.randint(0, self.sizepop - 1)
            idx2 = random.randint(0, self.sizepop - 1)
            while idx2 == idx1:
                idx2 = random.randint(0, self.sizepop - 1)
            newpop.append(copy.deepcopy(self.population[idx1]))
            newpop.append(copy.deepcopy(self.population[idx2]))
            r = random.random()
            if r < self.params[0]:
                crossPos = random.randint(1, self.vardim - 1)
                for j in range(crossPos, self.vardim):
                    newpop[i].chrom[j] = newpop[i].chrom[
                        j] * self.params[2] + (1 - self.params[2]) * newpop[i + 1].chrom[j]
                    newpop[i + 1].chrom[j] = newpop[i + 1].chrom[j] * self.params[2] + \
                        (1 - self.params[2]) * newpop[i].chrom[j]
        self.population = newpop
 
    def mutationOperation(self):
        '''
        mutation operation for genetic algorithm
        '''
        newpop = []
        for i in range(0, self.sizepop):
            newpop.append(copy.deepcopy(self.population[i]))
            r = random.random()
            if r < self.params[1]:
                mutatePos = random.randint(0, self.vardim - 1)
                theta = random.random()
                if theta > 0.5:
                    newpop[i].chrom[mutatePos] = newpop[i].chrom[
                        mutatePos] - (newpop[i].chrom[mutatePos] - self.bound[0, mutatePos]) * (1 - random.random() ** (1 - self.t / self.MAXGEN))
                else:
                    newpop[i].chrom[mutatePos] = newpop[i].chrom[
                        mutatePos] + (self.bound[1, mutatePos] - newpop[i].chrom[mutatePos]) * (1 - random.random() ** (1 - self.t / self.MAXGEN))
        self.population = newpop
 
    def printResult(self):
        '''
        plot the result of the genetic algorithm
        '''
        x = np.arange(0, self.MAXGEN)
        y1 = self.trace[:, 0]
        y2 = self.trace[:, 1]
        y3 = self.trace[:, 2]
        #plt.plot(x, y1, 'r', label='optimal value')
        plt.plot(x, y2, 'g', label='average value')
        plt.plot(x, y3, 'b', label='max value')
        fig = plt.gcf()
        fig.set_size_inches(18.5, 10.5)
        plt.xlabel("GENS")
        plt.ylabel("R2")
        plt.title("GA")
        plt.legend()
        plt.show()       
           

開跑開跑!

if __name__ == "__main__":
   bound = np.array([[0,0,0],[10,2,100]])
   ga = GeneticAlgorithm(10, 3, bound, 20, [0.9, 0.1, 0.5])
   ga.solve()
           

結果:

圖不是很好看,GA确實容易陷入局部極值。

c、e、g等與[ 9.93055626 0.28102442 24.58580654]

Python實作遺傳算法(GA)+支援向量回歸機(SVR)

6.融合GA-SVR模型

from sklearn.svm import SVR
import matplotlib.pyplot as plt  
start1=time.time()
model_svr = SVR(C=9.93055626,epsilon=0.28102442,gamma=24.58580654)
model_svr.fit(feature_train,target_train)
predict_results1=model_svr.predict(feature_test)
end1=time.time()

plt.plot(target_test)#測試數組
plt.plot(predict_results1)#測試數組
plt.legend(['True','SVR'])
fig = plt.gcf()
fig.set_size_inches(18.5, 10.5)
plt.title("SVR")  # 标題
plt.show()
print("EVS:",explained_variance_score(target_test,predict_results1))
print("R2:",metrics.r2_score(target_test,predict_results1))
print("Time:",end1-start1)
           

結果如下:

Python實作遺傳算法(GA)+支援向量回歸機(SVR)

時間雖然增大,但R方和EVS均有較大提升。