天天看點

AdaBoost Algorithm 自适應提升AdaBoost Algorithm 自适應提升

AdaBoost Algorithm 自适應提升

Intro

Adaboost is the abbreviation of adaptive boosting.

Adaboost combines plenty of weak classification models to form a strong classification.

Adaboost usually use one layer decision tree as weak claasification.

Adaboost (for binary classification)

T = { ( x i , y i ) } i = 1 ∼ n T = \{(x_i, y_i)\}_{i=1 \sim n} T={(xi​,yi​)}i=1∼n​ — data

w i w_i wi​ — weight of data

z i z_i zi​ — weight of classifiers

e m e_m em​ — weighted error rate

I n i t i a l i z a t i o n   s a m p l i n g          D 1 = ( w 1 ( 1 ) , w 2 ( 1 ) . . . w n ( 1 ) ) f o r ( m = 1 ∼ M ) {          t r a i n   t h e   c l a s s i f i e r :                  G m ( x ) = ± 1            c o m p u t e   w e i g h t e d   e r r o r   r a t e :                  e m = P ( G m ( x i ) ≠ y i ) = ∑ i = 1 N w i ( m )                  α m = 1 2 log ⁡ 1 − e m e m          u p d a t e   w e i g h t   d i s t r i b u t i o n ( a k a   r e s a m p l i n g )                  z m = ∑ i = 1 N w i ( m ) exp ⁡ [ − α m y i G m ( x i ) ]                  w i ( m + 1 ) = w i ( m ) z m exp ⁡ [ − α m y i G m ( x i ) ] } f i n a l   c l a s s i f i e r          f ( x ) = ∑ m = 1 M α m G m ( x )          G ( x ) = s i g n ( f ( x ) ) Initialization \ sampling \\ \ \ \ \ \ \ \ \ D_1 = (w_1^{(1)}, w_2^{(1)}...w_n^{(1)}) \\ for(m=1 \sim M) \\ \{ \\ \ \ \ \ \ \ \ \ train \ the \ classifier: \\ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ G_m(x) = \pm 1 \ \\ \ \ \ \ \ \ \ \ compute \ weighted \ error \ rate: \\ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ e_m = P(G_m(x_i) \not= y_i) = \sum\limits_{i=1}^N w^{(m)}_i \\ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \alpha_m = \frac12 \log \frac{1-e_m}{e_m} \\ \ \ \ \ \ \ \ \ update \ weight \ distribution(aka \ resampling) \\ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ z_m = \sum\limits^N_{i=1}w_i^{(m)}\exp[-\alpha_my_iG_m(x_i)] \\ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ w_i^{(m+1)} = \frac{w_i^{(m)}}{z_m}\exp[-\alpha_my_iG_m(x_i)] \\ \} \\ final \ classifier \\ \ \ \ \ \ \ \ \ f(x) = \sum\limits_{m=1}^M\alpha_mG_m(x) \\ \ \ \ \ \ \ \ \ G(x) = sign(f(x)) Initialization sampling        D1​=(w1(1)​,w2(1)​...wn(1)​)for(m=1∼M){        train the classifier:                Gm​(x)=±1         compute weighted error rate:                em​=P(Gm​(xi​)​=yi​)=i=1∑N​wi(m)​                αm​=21​logem​1−em​​        update weight distribution(aka resampling)                zm​=i=1∑N​wi(m)​exp[−αm​yi​Gm​(xi​)]                wi(m+1)​=zm​wi(m)​​exp[−αm​yi​Gm​(xi​)]}final classifier        f(x)=m=1∑M​αm​Gm​(x)        G(x)=sign(f(x))

Data scientists have proved that adaboost can reduce the error rate on training set and at least make the error rate on validation set grow not too fast.

繼續閱讀