AdaBoost Algorithm 自适應提升
Intro
Adaboost is the abbreviation of adaptive boosting.
Adaboost combines plenty of weak classification models to form a strong classification.
Adaboost usually use one layer decision tree as weak claasification.
Adaboost (for binary classification)
T = { ( x i , y i ) } i = 1 ∼ n T = \{(x_i, y_i)\}_{i=1 \sim n} T={(xi,yi)}i=1∼n — data
w i w_i wi — weight of data
z i z_i zi — weight of classifiers
e m e_m em — weighted error rate
I n i t i a l i z a t i o n s a m p l i n g D 1 = ( w 1 ( 1 ) , w 2 ( 1 ) . . . w n ( 1 ) ) f o r ( m = 1 ∼ M ) { t r a i n t h e c l a s s i f i e r : G m ( x ) = ± 1 c o m p u t e w e i g h t e d e r r o r r a t e : e m = P ( G m ( x i ) ≠ y i ) = ∑ i = 1 N w i ( m ) α m = 1 2 log 1 − e m e m u p d a t e w e i g h t d i s t r i b u t i o n ( a k a r e s a m p l i n g ) z m = ∑ i = 1 N w i ( m ) exp [ − α m y i G m ( x i ) ] w i ( m + 1 ) = w i ( m ) z m exp [ − α m y i G m ( x i ) ] } f i n a l c l a s s i f i e r f ( x ) = ∑ m = 1 M α m G m ( x ) G ( x ) = s i g n ( f ( x ) ) Initialization \ sampling \\ \ \ \ \ \ \ \ \ D_1 = (w_1^{(1)}, w_2^{(1)}...w_n^{(1)}) \\ for(m=1 \sim M) \\ \{ \\ \ \ \ \ \ \ \ \ train \ the \ classifier: \\ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ G_m(x) = \pm 1 \ \\ \ \ \ \ \ \ \ \ compute \ weighted \ error \ rate: \\ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ e_m = P(G_m(x_i) \not= y_i) = \sum\limits_{i=1}^N w^{(m)}_i \\ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \alpha_m = \frac12 \log \frac{1-e_m}{e_m} \\ \ \ \ \ \ \ \ \ update \ weight \ distribution(aka \ resampling) \\ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ z_m = \sum\limits^N_{i=1}w_i^{(m)}\exp[-\alpha_my_iG_m(x_i)] \\ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ w_i^{(m+1)} = \frac{w_i^{(m)}}{z_m}\exp[-\alpha_my_iG_m(x_i)] \\ \} \\ final \ classifier \\ \ \ \ \ \ \ \ \ f(x) = \sum\limits_{m=1}^M\alpha_mG_m(x) \\ \ \ \ \ \ \ \ \ G(x) = sign(f(x)) Initialization sampling D1=(w1(1),w2(1)...wn(1))for(m=1∼M){ train the classifier: Gm(x)=±1 compute weighted error rate: em=P(Gm(xi)=yi)=i=1∑Nwi(m) αm=21logem1−em update weight distribution(aka resampling) zm=i=1∑Nwi(m)exp[−αmyiGm(xi)] wi(m+1)=zmwi(m)exp[−αmyiGm(xi)]}final classifier f(x)=m=1∑MαmGm(x) G(x)=sign(f(x))
Data scientists have proved that adaboost can reduce the error rate on training set and at least make the error rate on validation set grow not too fast.