天天看点

AdaBoost算法的推导以及误差分析

本文利用向前分布算法详细推导AdaBoost算法。首先对向前分布向前分布算法进行的简单介绍。

向前分布算法

向前分布算法学习得到的是加法模型:

f ( x ) = ∑ m = 1 M α m G ( x ; γ m ) f(x)=\sum_{m=1}^{M}\alpha_{m}G(x;\gamma_{m}) f(x)=m=1∑M​αm​G(x;γm​)

其中, G ( x ; γ m ) G(x;\gamma_{m}) G(x;γm​)是基学习器, γ m \gamma_{m} γm​为基学习器参数, α m \alpha_{m} αm​为基学习器的系数。

向前分步算法由前往后,逐步增加一个基学习器以及它的系数,去极小化下面的损失函数:

m i n γ m , α m    ∑ i = 1 N L ( y i , ∑ k = 1 m − 1 α k G ( x i , γ k ) + α m G ( x i , γ m ) ) min_{\gamma_{m},\alpha_{m}}~~\sum_{i=1}^{N}L(y_{i},\sum_{k=1}^{m-1}\alpha_{k}G(x_{i},\gamma_{k})+\alpha_{m}G(x_{i},\gamma_{m})) minγm​,αm​​  i=1∑N​L(yi​,k=1∑m−1​αk​G(xi​,γk​)+αm​G(xi​,γm​))

其中N是样本个数, ( x i , y i ) (x_{i},y_{i}) (xi​,yi​)是样本, L ( y , f ( x ) ) L(y,f(x)) L(y,f(x))是损失函数。

注意到给定样本数据集以及损失函数即可按照向前分布算法学习到一个加法模型

推导过程

当向前分步算法选择的是指数损失函数

L ( y , f ( x ) ) = e x p ( − y f ( x ) ) L(y,f(x))=exp(-yf(x)) L(y,f(x))=exp(−yf(x))

指数损失函数与分类错误率 e r r err err的关系:

∑ i = 1 N L ( y i , f ( x i ) ) = ∑ i = 1 N e x p ( − y i f ( x i ) ) = ∑ i = 1 N e x p ( − y i f ( x i ) ) I ( y i ≠ f ( x i ) ) + ∑ i = 1 N e x p ( − y i f ( x i ) ) I ( y i = f ( x i ) ) = ∑ i = 1 N e I ( y i ≠ f ( x i ) ) + ∑ i = 1 N e − 1   I ( y i = f ( x i ) ) = ∑ i = 1 N e I ( y i ≠ f ( x i ) ) + e − 1 ( N − ∑ i = 1 N I ( y i ≠ f ( x i ) ) ) = ( e − e − 1 ) ∑ i = 1 N I ( y i ≠ f ( x i ) ) + e − 1 N = N ( e − e − 1 ) e r r + e − 1 N \begin{aligned} \sum_{i=1}^{N}L(y_{i},f(x_{i})) &=\sum_{i=1}^{N}exp(-y_{i}f(x_{i})) \\ & =\sum_{i=1}^{N}exp(-y_{i}f(x_{i}))I(y_{i}\neq f(x_{i}))+\sum_{i=1}^{N}exp(-y_{i}f(x_{i}))I(y_{i}= f(x_{i}))\\ &=\sum_{i=1}^{N}eI(y_{i}\neq f(x_{i}))+\sum_{i=1}^{N}e^{-1}~I(y_{i}= f(x_{i}))\\ &=\sum_{i=1}^{N}eI(y_{i}\neq f(x_{i}))+e^{-1}(N-\sum_{i=1}^{N}I(y_{i}\neq f(x_{i})))\\ &=(e-e^{-1})\sum_{i=1}^{N}I(y_{i}\neq f(x_{i}))+e^{-1}N\\ &=N(e-e^{-1})err+e^{-1}N \end{aligned} i=1∑N​L(yi​,f(xi​))​=i=1∑N​exp(−yi​f(xi​))=i=1∑N​exp(−yi​f(xi​))I(yi​​=f(xi​))+i=1∑N​exp(−yi​f(xi​))I(yi​=f(xi​))=i=1∑N​eI(yi​​=f(xi​))+i=1∑N​e−1 I(yi​=f(xi​))=i=1∑N​eI(yi​​=f(xi​))+e−1(N−i=1∑N​I(yi​​=f(xi​)))=(e−e−1)i=1∑N​I(yi​​=f(xi​))+e−1N=N(e−e−1)err+e−1N​

假设,m-1次迭代得到

f m − 1 ( x ) = α 1 G 1 + α 2 G 2 + . . . . + α m − 1 G m − 1 f_{m-1}(x)=\alpha_{1}G_{1}+\alpha_{2}G_{2}+....+\alpha_{m-1}G_{m-1} fm−1​(x)=α1​G1​+α2​G2​+....+αm−1​Gm−1​

第 m m m次迭代得到的 ( α m , G m ) (\alpha_{m},G_{m}) (αm​,Gm​)由下式求得

( α m , G m ) = a r g m i n α , G ∑ i = 1 N e x p ( − y i ( f m − 1 ( x i ) + α y i G ( x i ) ) ) (\alpha_{m},G_{m})=argmin_{\alpha,G}\sum_{i=1}^{N}exp(-y_{i}(f_{m-1}(x_{i})+\alpha y_iG(x_{i}))) (αm​,Gm​)=argminα,G​i=1∑N​exp(−yi​(fm−1​(xi​)+αyi​G(xi​)))

令 ω ˉ m i = − y i ( f m − 1 ( x i ) ) \bar{\omega}_{mi}=-y_{i}(f_{m-1}(x_{i})) ωˉmi​=−yi​(fm−1​(xi​)), ω ˉ m i \bar{\omega}_{mi} ωˉmi​与 ( α , G ) (\alpha,G) (α,G)无关,随着每轮迭代更新,于是有

( α m , G m ) = a r g m i n α , G ∑ i = 1 N ω ˉ m i e x p ( − α y i G ( x i ) ) )        ( ∗ ) (\alpha_{m},G_{m})=argmin_{\alpha,G}\sum_{i=1}^{N}\bar{\omega}_{mi}exp(-\alpha y_iG(x_{i})))~~~~~~(*) (αm​,Gm​)=argminα,G​i=1∑N​ωˉmi​exp(−αyi​G(xi​)))      (∗)

对任意 α > 0 \alpha>0 α>0,有

∑ i = 1 N ω ˉ m i e x p ( − α y i G ( x i ) ) = ∑ i = 1 N ω ˉ m i e α I ( y i ≠ G ( x i ) ) + ∑ i = 1 N ω ˉ m i e − α I ( y i = G ( x i ) ) = ∑ i = 1 N ω ˉ m i e α I ( y i ≠ G ( x i ) ) + e − α ( ∑ i = 1 N ω ˉ m i − ∑ i = 1 N ω ˉ m i I ( y i ≠ G ( x i ) ) = e − α ∑ i = 1 N ω ˉ m i + ( e α − e − α ) ∑ i = 1 N ω ˉ m i I ( y i ≠ G ( x i ) )     ( ∗ ∗ ) \begin{aligned} \sum_{i=1}^{N}\bar{\omega}_{mi}exp(-\alpha y_iG(x_{i})) &=\sum_{i=1}^{N}\bar{\omega}_{mi}e^{\alpha}I(y_{i}\neq G(x_{i}))+\sum_{i=1}^{N}\bar{\omega}_{mi}e^{-\alpha}I(y_{i}=G(x_{i}))\\ &=\sum_{i=1}^{N}\bar{\omega}_{mi}e^{\alpha}I(y_{i}\neq G(x_{i}))+e^{-\alpha}(\sum_{i=1}^{N}\bar{\omega}_{mi}-\sum_{i=1}^{N}\bar{\omega}_{mi}I(y_{i}\neq G(x_{i}))\\ &=e^{-\alpha}\sum_{i=1}^{N}\bar{\omega}_{mi}+(e^\alpha-e^{-\alpha})\sum_{i=1}^{N}\bar{\omega}_{mi}I(y_{i}\neq G(x_{i}))~~~(**) \end{aligned} i=1∑N​ωˉmi​exp(−αyi​G(xi​))​=i=1∑N​ωˉmi​eαI(yi​​=G(xi​))+i=1∑N​ωˉmi​e−αI(yi​=G(xi​))=i=1∑N​ωˉmi​eαI(yi​​=G(xi​))+e−α(i=1∑N​ωˉmi​−i=1∑N​ωˉmi​I(yi​​=G(xi​))=e−αi=1∑N​ωˉmi​+(eα−e−α)i=1∑N​ωˉmi​I(yi​​=G(xi​))   (∗∗)​

由 ( e α − e − α ) > 0 (e^\alpha-e^{-\alpha})>0 (eα−e−α)>0,可知使 ( ∗ ) ( *) (∗)式最小的 G ( x ) G(x) G(x)为

G m = a r g   m i n G ∑ i = 1 N ω ˉ m i I ( y i ≠ G ( x i ) ) G_{m}=arg~min_{G}\sum_{i=1}^{N}\bar{\omega}_{mi}I(y_{i}\neq G(x_{i})) Gm​=arg minG​i=1∑N​ωˉmi​I(yi​​=G(xi​))

再继续求 α m \alpha_{m} αm​,对 ( ∗ ∗ ) (**) (∗∗)式关于 α \alpha α求导,并令其大于0,得到

α > 1 2 l o g 1 − e r r m e r r m \alpha>\frac{1}{2}log \frac{1-err_{m}}{err_{m}} α>21​logerrm​1−errm​​

e r r m err_{m} errm​为 G m G_{m} Gm​在数据集上的加权误差率,等于

e r r m = ∑ i = 1 N ω ˉ m i I ( y i ≠ G ( x i ) ) ∑ i = 1 N ω ˉ m i = ∑ i = 1 N ω m i I ( y i ≠ G ( x i ) ) err_{m}=\frac{\sum_{i=1}^{N}\bar{\omega}_{mi}I(y_{i}\neq G(x_{i}))}{\sum_{i=1}^{N}\bar{\omega}_{mi}}=\sum_{i=1}^{N}\omega_{mi}I(y_{i}\neq G(x_{i})) errm​=∑i=1N​ωˉmi​∑i=1N​ωˉmi​I(yi​​=G(xi​))​=i=1∑N​ωmi​I(yi​​=G(xi​))

即当 α m = 1 2 l o g 1 − e r r m e r r m \alpha_{m}=\frac{1}{2}log \frac{1-err_{m}}{err_{m}} αm​=21​logerrm​1−errm​​时, ( ∗ ∗ ) (**) (∗∗)取最小值。我们认弱分类器的误差率 e r r m err_{m} errm​小于1/2,即优于随机猜测的结果,故 α m > 0 \alpha_{m}>0 αm​>0。

得到模型迭代结果

f m ( x ) = f m − 1 ( x ) + α m G m ( x ) f_{m}(x)=f_{m-1}(x)+\alpha_{m}G_{m}(x) fm​(x)=fm−1​(x)+αm​Gm​(x)

简单总结一下整个算法的过程:

输入:训练数据集: { ( x i , y i ) } i = 1 , 2 , . . . , N \{(x_{i},y_{i})\}_{i=1,2,...,N} {(xi​,yi​)}i=1,2,...,N​, x i ∈ χ ⊆ R n , y i = ± 1 x_{i}\in \chi \subseteq R^{n},y_{i}=\pm1 xi​∈χ⊆Rn,yi​=±1

           ~~~~~~~~~~           损失函数: L ( y , f ( x ) ) = e x p ( − y f ( x ) ) L(y,f(x))=exp(-yf(x)) L(y,f(x))=exp(−yf(x))

1.初始化 f 0 ( x ) = 0 f_{0}(x)=0 f0​(x)=0;

2.计算权值 ω ˉ m i = e x p ( − y i f m − 1 ( x i ) ) , m ≥ 1 , i = 1 , . . . , N \bar{\omega}_{mi}=exp(-y_{i}f_{m-1}(x_{i})),m\ge1,i=1,...,N ωˉmi​=exp(−yi​fm−1​(xi​)),m≥1,i=1,...,N;

3.求基分类器 G m = a r g   m i n G ∑ i = 1 N ω ˉ m i I ( y i ≠ G ( x i ) ) G_{m}=arg~min_{G}\sum_{i=1}^{N}\bar{\omega}_{mi}I(y_{i}\neq G(x_{i})) Gm​=arg minG​∑i=1N​ωˉmi​I(yi​​=G(xi​));

4.求分类误差率: e r r m = ∑ i = 1 N ω ˉ m i I ( y i ≠ G ( x i ) ) ∑ i = 1 N ω ˉ m i err_{m}=\frac{\sum_{i=1}^{N}\bar{\omega}_{mi}I(y_{i}\neq G(x_{i}))}{\sum_{i=1}^{N}\bar{\omega}_{mi}} errm​=∑i=1N​ωˉmi​∑i=1N​ωˉmi​I(yi​​=G(xi​))​;

5.求分类器系数: α m = 1 2 l o g 1 − e r r m e r r m \alpha_{m}=\frac{1}{2}log \frac{1-err_{m}}{err_{m}} αm​=21​logerrm​1−errm​​;

6.得到AdaBoost模型: f m ( x ) = f m − 1 ( x ) + α m G m ( x ) f_{m}(x)=f_{m-1}(x)+\alpha_{m}G_{m}(x) fm​(x)=fm−1​(x)+αm​Gm​(x)

令 ω m + 1 i = w m i Z m e x p ( − α m y i G m ( x i ) ) , i = 1 , . . . , N \omega_{m+1i}=\frac{w_{mi}}{Z_{m}}exp(-\alpha_{m}y_{i}G_{m}(x_{i})),i=1,...,N ωm+1i​=Zm​wmi​​exp(−αm​yi​Gm​(xi​)),i=1,...,N

ω 1 i = 1 N , , i = 1 , . . . , N \omega_{1i}=\frac{1}{N},,i=1,...,N ω1i​=N1​,,i=1,...,N

其中 Z m Z_{m} Zm​是规范化因子

Z m = ∑ i = 1 N ω m i e x p ( − α m y i G m ( x i ) ) Z_{m}=\sum_{i=1}^{N}\omega_{mi}exp(-\alpha_{m}y_{i}G_{m}(x_{i})) Zm​=i=1∑N​ωmi​exp(−αm​yi​Gm​(xi​))

则对任意 m ≥ 1 m\ge1 m≥1,有

ω m i = ω ˉ m i ∑ i = 1 N ω ˉ m i \omega_{mi}=\frac{\bar{\omega}_{mi}}{\sum_{i=1}^{N}\bar{\omega}_{mi}} ωmi​=∑i=1N​ωˉmi​ωˉmi​​

即 ω m i \omega_{mi} ωmi​和 ω ˉ m i \bar{\omega}_{mi} ωˉmi​之间只相差一个规范化因子

ω ˉ m i ∑ i = 1 N ω ˉ m i = 1 N Π k = 1 m − 1 e x p ( − y i α k G k ( x i ) ) 1 N ∑ i = 1 N Π k = 1 m − 1 e x p ( − y i α k G k ( x i ) )                    ( 1 ) = 1 N Π k = 1 m − 1 e x p ( − y i α k G k ( x i ) ) ∑ i = 1 N ω 2 i Z 1 Π k = 2 m − 1 e x p ( − y i α k G k ( x i ) )              ( 2 ) = 1 N Π k = 1 m − 1 e x p ( − y i α k G k ( x i ) ) ∑ i = 1 N ω 3 i Z 1 Z 2 Π k = 3 m − 1 e x p ( − y i α k G k ( x i ) )          ( 3 ) = . . . . . .                                                                    ( 4 ) = 1 N Π k = 1 m − 1 e x p ( − y i α k G k ( x i ) ) Z 1 Z 2 . . . Z m − 1                             ( 5 ) = ω 2 i Π k = 2 m − 1 e x p ( − y i α k G k ( x i ) ) Z 2 . . . Z m − 1 = ω 3 i Π k = 3 m − 1 e x p ( − y i α k G k ( x i ) ) Z 3 . . . Z m − 1 = . . . = ω m − 1 i e x p ( − y i α m − 1 G m − 1 ( x i ) ) Z m − 1 = ω m i \begin{aligned} \frac{\bar{\omega}_{mi}}{\sum_{i=1}^{N}\bar{\omega}_{mi}} & =\frac{\frac{1}{N}\Pi_{k=1}^{m-1}exp(-y_{i}\alpha_{k}G_{k}(x_{i}))}{\frac{1}{N}\sum_{i=1}^{N}\Pi_{k=1}^{m-1}exp(-y_{i}\alpha_{k}G_{k}(x_{i}))}~~~~~~~~~~~~~~~~~~(1)\\ &=\frac{\frac{1}{N}\Pi_{k=1}^{m-1}exp(-y_{i}\alpha_{k}G_{k}(x_{i}))}{\sum_{i=1}^{N}\omega_{2i}Z_{1}\Pi_{k=2}^{m-1}exp(-y_{i}\alpha_{k}G_{k}(x_{i}))}~~~~~~~~~~~~(2)\\ &=\frac{\frac{1}{N}\Pi_{k=1}^{m-1}exp(-y_{i}\alpha_{k}G_{k}(x_{i}))}{\sum_{i=1}^{N}\omega_{3i}Z_{1}Z_{2}\Pi_{k=3}^{m-1}exp(-y_{i}\alpha_{k}G_{k}(x_{i}))}~~~~~~~~(3)\\ &=......~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~(4)\\ &=\frac{\frac{1}{N}\Pi_{k=1}^{m-1}exp(-y_{i}\alpha_{k}G_{k}(x_{i}))}{Z_{1}Z_{2}...Z_{m-1}}~~~~~~~~~~~~~~~~~~~~~~~~~~~(5)\\ &=\frac{\omega_{2i}\Pi_{k=2}^{m-1}exp(-y_{i}\alpha_{k}G_{k}(x_{i}))}{Z_{2}...Z_{m-1}}\\ &=\frac{\omega_{3i}\Pi_{k=3}^{m-1}exp(-y_{i}\alpha_{k}G_{k}(x_{i}))}{Z_{3}...Z_{m-1}}\\ &=...\\ &=\frac{\omega_{m-1i}exp(-y_{i}\alpha_{m-1}G_{m-1}(x_{i}))}{Z_{m-1}}\\ &=\omega_{mi} \end{aligned} ∑i=1N​ωˉmi​ωˉmi​​​=N1​∑i=1N​Πk=1m−1​exp(−yi​αk​Gk​(xi​))N1​Πk=1m−1​exp(−yi​αk​Gk​(xi​))​                  (1)=∑i=1N​ω2i​Z1​Πk=2m−1​exp(−yi​αk​Gk​(xi​))N1​Πk=1m−1​exp(−yi​αk​Gk​(xi​))​            (2)=∑i=1N​ω3i​Z1​Z2​Πk=3m−1​exp(−yi​αk​Gk​(xi​))N1​Πk=1m−1​exp(−yi​αk​Gk​(xi​))​        (3)=......                                                                  (4)=Z1​Z2​...Zm−1​N1​Πk=1m−1​exp(−yi​αk​Gk​(xi​))​                           (5)=Z2​...Zm−1​ω2i​Πk=2m−1​exp(−yi​αk​Gk​(xi​))​=Z3​...Zm−1​ω3i​Πk=3m−1​exp(−yi​αk​Gk​(xi​))​=...=Zm−1​ωm−1i​exp(−yi​αm−1​Gm−1​(xi​))​=ωmi​​

即 ω ˉ m i \bar{\omega}_{mi} ωˉmi​和 ω m i \omega_{mi} ωmi​之间只相差一个规范化因子,对求 G m G_{m} Gm​无影响,故算法可以变为

输入:训练数据集: { ( x i , y i ) } i = 1 , 2 , . . . , N \{(x_{i},y_{i})\}_{i=1,2,...,N} {(xi​,yi​)}i=1,2,...,N​, x i ∈ χ ⊆ R n , y i = ± 1 x_{i}\in \chi \subseteq R^{n},y_{i}=\pm1 xi​∈χ⊆Rn,yi​=±1

           ~~~~~~~~~~           损失函数: L ( y , f ( x ) ) = e x p ( − y f ( x ) ) L(y,f(x))=exp(-yf(x)) L(y,f(x))=exp(−yf(x))

1.初始化 f 0 ( x ) = 0 f_{0}(x)=0 f0​(x)=0;

2.计算权值 ω m i = ω m − 1 , i e x p ( α m − 1 y i G m − 1 ( x i ) ) Z m − 1 , m ≥ 1 , i = 1 , . . . , N \omega_{mi}=\frac{\omega_{m-1,i}exp(\alpha_{m-1}y_{i}G_{m-1}(x_{i}))}{Z_{m-1}},m\ge1,i=1,...,N ωmi​=Zm−1​ωm−1,i​exp(αm−1​yi​Gm−1​(xi​))​,m≥1,i=1,...,N;

3.求基分类器 G m = a r g   m i n G ∑ i = 1 N ω m i I ( y i ≠ G ( x i ) ) G_{m}=arg~min_{G}\sum_{i=1}^{N}\omega_{mi}I(y_{i}\neq G(x_{i})) Gm​=arg minG​∑i=1N​ωmi​I(yi​​=G(xi​));

4.求分类误差率: e r r m = ∑ i = 1 N ω m i I ( y i ≠ G ( x i ) ) err_{m}=\sum_{i=1}^{N}\omega_{mi}I(y_{i}\neq G(x_{i})) errm​=∑i=1N​ωmi​I(yi​​=G(xi​));

5.求分类器系数: α m = 1 2 l o g 1 − e r r m e r r m \alpha_{m}=\frac{1}{2}log \frac{1-err_{m}}{err_{m}} αm​=21​logerrm​1−errm​​;

6.得到AdaBoost模型: f m ( x ) = f m − 1 ( x ) + α m G m ( x ) f_{m}(x)=f_{m-1}(x)+\alpha_{m}G_{m}(x) fm​(x)=fm−1​(x)+αm​Gm​(x)

误差分析

AdaBoost的分类误差:

1 N ∑ i = 1 N I ( f ( x i ) ≠ y i ) ≤ e x p ( − 2 ∑ m = 1 M γ m 2 ) \frac{1}{N}\sum_{i=1}^{N}I(f(x_{i})\neq y_{i})\le exp(-2\sum_{m=1}^M\gamma_{m}^2) N1​i=1∑N​I(f(xi​)​=yi​)≤exp(−2m=1∑M​γm2​)

其中 γ m = 1 2 − e m , f ( x ) = ∑ k = 1 m α k G k ( x ) \gamma_{m}=\frac{1}{2}-e_{m},f(x)=\sum_{k=1}^{m}\alpha_{k}G_{k}(x) γm​=21​−em​,f(x)=∑k=1m​αk​Gk​(x)

证明:

首先证明 1 N ∑ i = 1 N I ( f ( x i ) ≠ y i ) ≤ Π k = 1 m Z k \frac{1}{N}\sum_{i=1}^{N}I(f(x_{i})\neq y_{i})\le \Pi_{k=1}^{m}Z_{k} N1​∑i=1N​I(f(xi​)​=yi​)≤Πk=1m​Zk​

当 y i = f ( x i ) y_{i}=f(x_{i}) yi​=f(xi​)时

I ( f ( x i ) ≠ y i ) = 0 ≤ e x p ( − y i f ( x i ) ) I(f(x_{i})\neq y_{i})=0\le exp(-y_{i}f(x_{i})) I(f(xi​)​=yi​)=0≤exp(−yi​f(xi​))

当 y i ≠ f ( x i ) y_{i}\neq f(x_{i}) yi​​=f(xi​)时

I ( f ( x i ) ≠ y i ) = e x p ( − y i f ( x i ) ) = 1 I(f(x_{i})\neq y_{i})= exp(-y_{i}f(x_{i}))=1 I(f(xi​)​=yi​)=exp(−yi​f(xi​))=1

1 N ∑ i = 1 N I ( f ( x i ) ≠ y i ) ≤ 1 N ∑ i = 1 N e x p ( − y i f ( x i ) ) \frac{1}{N}\sum_{i=1}^{N}I(f(x_{i})\neq y_{i})\le\frac{1}{N}\sum_{i=1}^{N}exp(-y_{i}f(x_{i})) N1​i=1∑N​I(f(xi​)​=yi​)≤N1​i=1∑N​exp(−yi​f(xi​))

又由式 ( 1 ) ( 2 ) ( 3 ) ( 4 ) ( 5 ) (1)(2)(3)(4)(5) (1)(2)(3)(4)(5)可知

1 N ∑ i = 1 N e x p ( − y i f ( x i ) ) = Π k = 1 m Z k \frac{1}{N}\sum_{i=1}^{N}exp(-y_{i}f(x_{i}))=\Pi_{k=1}^{m}Z_{k} N1​i=1∑N​exp(−yi​f(xi​))=Πk=1m​Zk​

其次证明: Π k = 1 m Z k ≤ e x p ( − 2 ∑ k = 1 m γ k 2 ) \Pi_{k=1}^{m}Z_{k}\le exp(-2\sum_{k=1}^{m}\gamma_{k}^2) Πk=1m​Zk​≤exp(−2∑k=1m​γk2​)

∀ k , 1 ≤ k ≤ m \forall k,1\le k\le m ∀k,1≤k≤m,有

Z k = ∑ j = 1 k ω k i e x p ( − α k y i G k ( x i ) ) = ∑ G k ( x i ) = y i ω k i e − α k + ∑ G k ( x i ) ≠ y i ω k i e α k = ( 1 − e k ) e − α k + e k e α k = 2 ( e k ( 1 − e k ) ) 1 2 = ( 1 − 4 γ k 2 ) 1 2 \begin{aligned} Z_{k} & =\sum_{j=1}^{k}\omega_{ki}exp(-\alpha_{k}y_{i}G_{k}(x_{i}))\\ &=\sum_{G_{k}(x_{i})=y_{i}}\omega_{ki}e^{-\alpha_{k}}+\sum_{G_{k}(x_{i})\neq y_{i}}\omega_{ki}e^{\alpha_{k}}\\ &=(1-e_{k})e^{-\alpha_{k}}+e_{k}e^{\alpha_{k}}\\ &=2(e_{k}(1-e_{k}))^{\frac{1}{2}}\\ &=(1-4\gamma_{k}^2)^{\frac{1}{2}} \end{aligned} Zk​​=j=1∑k​ωki​exp(−αk​yi​Gk​(xi​))=Gk​(xi​)=yi​∑​ωki​e−αk​+Gk​(xi​)​=yi​∑​ωki​eαk​=(1−ek​)e−αk​+ek​eαk​=2(ek​(1−ek​))21​=(1−4γk2​)21​​

于是

Π k = 1 m Z k = Π k = 1 m ( 1 − 4 γ k 2 ) 1 2 \Pi_{k=1}^{m}Z_{k}=\Pi_{k=1}^{m}(1-4\gamma_{k}^2)^{\frac{1}{2}} Πk=1m​Zk​=Πk=1m​(1−4γk2​)21​

对 ∀ k , 1 ≤ k ≤ m \forall k,1\le k\le m ∀k,1≤k≤m,只需证明

( 1 − 4 γ k 2 ) 1 2 ≤ e x p ( − 2 γ k 2 )             ( ∗ ) (1-4\gamma_{k}^2)^{\frac{1}{2}}\le exp(-2\gamma_{k}^{2})~~~~~~~~~~~(*) (1−4γk2​)21​≤exp(−2γk2​)           (∗)

即有

Π k = 1 m Z k = Π k = 1 m ( 1 − 4 γ k 2 ) 1 2 ≤ e x p ( − 2 ∑ k = 1 m γ k 2 ) \Pi_{k=1}^{m}Z_{k}=\Pi_{k=1}^{m}(1-4\gamma_{k}^2)^{\frac{1}{2}}\le exp(-2\sum_{k=1}^{m}\gamma_{k}^2) Πk=1m​Zk​=Πk=1m​(1−4γk2​)21​≤exp(−2k=1∑m​γk2​)

对 ( ∗ ) (*) (∗)式两边加对数,得

1 2 l n ( 1 − 4 γ k 2 ) ≤ − 2 γ k 2 \frac{1}{2}ln(1-4\gamma_{k}^2)\le -2\gamma_{k}^{2} 21​ln(1−4γk2​)≤−2γk2​

对 l n ( 1 − 4 γ k 2 ) ln(1-4\gamma_{k}^2) ln(1−4γk2​)进行泰勒展开,得

l n ( 1 − 4 γ k 2 ) = − 4 γ k 2 − ( − 4 γ k 2 ) 2 2 + ( − 4 γ k 2 ) 3 3 − . . . − 4 γ k 2 n − . . . ln(1-4\gamma_{k}^2)=-4\gamma_{k}^2-\frac{(-4\gamma_{k}^2)^2}{2}+\frac{(-4\gamma_{k}^2)^3}{3}-...-\frac{4\gamma_{k}^2}{n}-... ln(1−4γk2​)=−4γk2​−2(−4γk2​)2​+3(−4γk2​)3​−...−n4γk2​​−...

故有

1 2 l n ( 1 − 4 γ k 2 ) ≤ − 2 γ k 2 \frac{1}{2}ln(1-4\gamma_{k}^2)\le -2\gamma_{k}^{2} 21​ln(1−4γk2​)≤−2γk2​

证毕。

注意:

当存在 γ > 0 \gamma >0 γ>0,使得对任意 k k k,均满足 γ k ≥ γ \gamma_{k}\ge \gamma γk​≥γ时,即对所有基分类器的误差率都小于某个小于 1 2 \frac{1}{2} 21​时,有

1 N ∑ i = 1 N I ( f ( x i ) ≠ y i ) ≤ e x p ( − 2 m γ 2 ) \frac{1}{N}\sum_{i=1}^{N}I(f(x_{i})\neq y_{i})\le exp(-2m\gamma^{2}) N1​i=1∑N​I(f(xi​)​=yi​)≤exp(−2mγ2)

即AdaBoost算法的误差率随者迭代次数的增加以指数的形式下降。

正则化

为了防止过拟合,在AdaBoost算法弱分类器的迭代过程中加入正则化项 v v v,即

f m ( x ) = f m − 1 ( x ) + v α m G m ( x ) f_{m}(x)=f_{m-1}(x)+v\alpha_{m}G_{m}(x) fm​(x)=fm−1​(x)+vαm​Gm​(x)

其中 v v v称为步长, v v v越小,所需要的迭代次数越高,通常用步长和迭代的最高次数决定算法的拟合效果。

优缺点

优点:

1)分类正确率高;

2)不易发生过拟合;

3)基于AdaBoost的框架,可以利用各种分类回归学习器构建弱学习器,通常使用决策树和神经网络。

缺点:

1)对噪声敏感,对于异样的样本可能会赋予很高的权重。

继续阅读