BP誤差反向傳播算法
簡介
BP(Back Propagation)算法是有監督情況下進行訓練神經網絡中權重系數的重要方法。神經網絡是判别模型的重要組成部分。而判别模型又是模式識别的一大塊。關系如下圖
BP算法由Rumelhart和McClelland于1985年提出。核心思想是利用輸出後的誤差來估計輸出層的前一層的誤差,再用這個誤差設計更前一層的誤差,如此一層一層地反向傳播下去,進而獲得其他各層的誤差估計。
模型
一般性思路,先在低緯度體會,再往高緯度推廣
現構造一個 d − n h − c d-n_h-c d−nh−c的三層網絡,如下入所示
隐藏層中第j個節點的輸入值
n e t j = ∑ i = 1 d x i w j i net_j=\sum\limits_{i=1}^{d}x_iw_{ji} netj=i=1∑dxiwji
假設隐藏層節點的非線性函數均為 y = f H ( x ) y=f_H(x) y=fH(x),則
y j = f H ( n e t j ) = f H ( ∑ i = 1 d x i w j i ) y_j=f_H(net_j)=f_H(\sum\limits_{i=1}^{d}x_iw_{ji}) yj=fH(netj)=fH(i=1∑dxiwji)
輸出層的第k個節點的輸入值
n e t k = ∑ j = 1 n H y j w k j net_k=\sum\limits_{j=1}^{n_H}y_jw_{kj} netk=j=1∑nHyjwkj
假設輸出層節點的非線性函數均為 y = f O ( x ) y=f_O(x) y=fO(x),則
z k = f O ( n e t k ) = f O ( ∑ j = 1 n H y j w k j ) = f O ( ∑ j = 1 n H f H ( ∑ i = 1 d x i w j i ) w k j ) z_k=f_O(net_k)=f_O(\sum\limits_{j=1}^{n_H}y_jw_{kj})=f_O(\sum\limits_{j=1}^{n_H}f_H(\sum\limits_{i=1}^{d}x_iw_{ji})w_{kj}) zk=fO(netk)=fO(j=1∑nHyjwkj)=fO(j=1∑nHfH(i=1∑dxiwji)wkj)
引入目标損失函數
J ( w ) = 1 2 ∑ k = 1 c ( t k − z k ) 2 = 1 2 ( t − z ) 2 J(w)=\frac{1}{2}\sum\limits_{k=1}^c(t_k-z_k)^2=\frac{1}{2}(t-z)^2 J(w)=21k=1∑c(tk−zk)2=21(t−z)2
則
∂ J ∂ w k j = ∂ J ∂ z k ∂ z k ∂ n e t k ∂ n e t k ∂ w k j = − ( t k − z k ) f O ′ ( n e t k ) y j \frac{\partial J}{\partial w_{kj}}=\frac{\partial J}{\partial z_k}\frac{\partial z_k}{\partial net_k}\frac{\partial net_k}{\partial w_{kj}}=-(t_k-z_k)f_O'(net_k)y_j ∂wkj∂J=∂zk∂J∂netk∂zk∂wkj∂netk=−(tk−zk)fO′(netk)yj
則
∂ J ∂ w j i = ∂ J ∂ z k ∂ z k ∂ n e t k ∂ n e t k ∂ y j ∂ y j ∂ n e t j ∂ n e t j ∂ w j i = − ( t k − z k ) f O ′ ( n e t k ) w k j f H ′ ( n e t j ) x i \frac{\partial J}{\partial w_{ji}}=\frac{\partial J}{\partial z_k}\frac{\partial z_k}{\partial net_k}\frac{\partial net_k}{\partial y_j}\frac{\partial y_j}{\partial net_j}\frac{\partial net_j}{\partial w_{ji}}=-(t_k-z_k)f_O'(net_k)w_{kj}f_H'(net_j)x_i ∂wji∂J=∂zk∂J∂netk∂zk∂yj∂netk∂netj∂yj∂wji∂netj=−(tk−zk)fO′(netk)wkjfH′(netj)xi
輸入到隐藏層的權重與隐藏層到輸出層的權值有關。是以稱之為方向傳播。
根據以上兩組權重的偏導可以得到梯度下降算法的權重跟新準則為
w m + 1 = w m − η ∂ J ∂ w m w_{m+1}=w_m-\eta \frac{\partial J}{\partial w_m} wm+1=wm−η∂wm∂J
一直更新權重,直到誤差J收斂。
權重更新方法
批量更新
即對一批所有樣本訓練完之後再跟新權重,權重為所有樣本增量權重的和
僞代碼如下
matlab仿真
現有一個三類為題,每類有10個三維的樣本資料。構造一個三層網絡,其中隐藏層的激勵函數為雙曲正切函數,輸出層的激勵函數為sigmoid函數。目标函數為平方誤差。
網絡的輸入層有10個節點,隐藏層可以設定為5個,輸出層有3個節點。
三類資料分别為x_1,x_2,x_3(見代碼)
代碼如下
clc;clear;close all;
x_1=[
1.58,2.32,-5.8;
0.67,1.58,-4.78;
1.04,1.01,-3.63;
-1.49,2.18,-3.39;
-0.41,1.21,-4.73;
1.39,3.16,2.87;
1.20,1.40,-1.89;
-0.92,1.44,-3.22;
0.45,1.33,-4.38;
-0.76,0.84,-1.96];
x_2=[
0.21,0.03,-2.21;
0.37,0.28,-1.8;
0.18,1.22,0.16;
-0.24,0.93,-1.01;
-1.18,0.39,-0.39;
0.74,0.96,-1.16;
-0.38,1.94,-0.48;
0.02,0.72,-0.17;
0.44,1.31,-0.14;
0.46,1.49,0.68];
x_3=[
-1.54,1.17,0.64;
5.41,3.45,-1.33;
1.55,0.99,2.69;
1.86,3.19,1.51;
1.68,1.79,-0.87;
3.51,-0.22,-1.39;
1.40,-0.44,-0.92;
0.44,0.83,1.97;
0.25,0.68,-0.99;
0.66,-0.45,0.08];
num_t=8;% 訓練樣本數
num_d=2;% 測試樣本數
s_1=randperm(10, num_t);
s_2=randperm(10, num_t);
s_3=randperm(10, num_t);
x=[x_1(s_1,:);x_2(s_2,:);x_3(s_3,:)];
x_d=[x_1;x_2;x_3];
xx=x;
x=[x,ones(30-3*num_d,1)]; % add bias of input layer
x_d=[x_d,ones(30,1)];
t=[ones(num_t,1),-zeros(num_t,2);-zeros(num_t,1),ones(num_t,1),-zeros(num_t,1);-zeros(num_t,2),ones(num_t,1)];
%% 變量空間
rr=1000;%疊代次數
n_H=9;%隐藏層節點數
eta=0.1;% 步長
c=3;%輸出層節點數
d=length(x_1(1,:));
num=length(x_1)+length(x_2)+length(x_3);
J_n=zeros(1,num);
J=zeros(1,rr);%誤差
w_kj=random('unif',-1,1,c,n_H);
w_ji=random('unif',-1,1,n_H,d+1); %add bias of input layer
net_k=zeros(1,c);
z_k=zeros(1,c);
net_j=zeros(1,n_H);
y_j=zeros(1,n_H);
step=0;
%% Batch BP
for r=1:rr
add_w_kj=0;
add_w_ji=0;
w_kj=w_kj+add_w_kj;
w_ji=w_ji+add_w_ji;
for n=1:num-3*num_d
net_j=x(n,:)*w_ji';
y_j=tanh(net_j);
net_k=y_j*w_kj';
z_k=1./(1+exp(-net_k));
delta_k=(t(n,:)-z_k).*((1./(1+exp(-net_k))).*(1-1./(1+exp(-net_k))));
delta_w_kj=eta*delta_k'*y_j;
delta_j=(1-power(tanh(net_j),2)).*(delta_k*w_kj);
delta_w_ji=eta*delta_j'*x(n,:);
add_w_kj=add_w_kj+delta_w_kj;
add_w_ji=add_w_ji+delta_w_ji;
J_n(n)=((t(n,:)-z_k)*(t(n,:)-z_k)');
end
J(r)=sum(J_n);
w_kj=w_kj+add_w_kj;
w_ji=w_ji+add_w_ji;
end
plot(J);
xlabel('疊代次數');
ylabel('損失');
result_w_kj=w_kj;
result_w_ji=w_ji;
output=zeros(num,c);
output_binary=zeros(num,c);
for i=1:num
output(i,:)=1./( 1+exp( -((tanh(x_d(i,:)*result_w_ji') ) *result_w_kj')) );
for j=1:c
if output(i,j)==max(output(i,:))
output_binary(i,j)=1;
else
output_binary(i,j)=0; %% 輸出結果
end
end
end
%% 畫出離散點
plot3(x_1(:,1),x_1(:,2),x_1(:,3),'*');
hold on;
plot3(x_2(:,1),x_2(:,2),x_2(:,3),'o');
hold on;
plot3(x_3(:,1),x_3(:,2),x_3(:,3),'+');
hold on;