AdaBoost Algorithm 自适應提升

Intro

Adaboost is the abbreviation of adaptive boosting.

Adaboost combines plenty of weak classification models to form a strong classification.

Adaboost usually use one layer decision tree as weak claasification.

Adaboost (for binary classification)

T = { ( x i , y i ) } i = 1 ∼ n T = \{(x_i, y_i)\}_{i=1 \sim n} T={(xi,yi)}i=1∼n — data

w i w_i wi — weight of data

z i z_i zi — weight of classifiers

e m e_m em — weighted error rate

I n i t i a l i z a t i o n s a m p l i n g D 1 = ( w 1 ( 1 ) , w 2 ( 1 ) . . . w n ( 1 ) ) f o r ( m = 1 ∼ M ) { t r a i n t h e c l a s s i f i e r : G m ( x ) = ± 1 c o m p u t e w e i g h t e d e r r o r r a t e : e m = P ( G m ( x i ) ≠ y i ) = ∑ i = 1 N w i ( m ) α m = 1 2 log ⁡ 1 − e m e m u p d a t e w e i g h t d i s t r i b u t i o n ( a k a r e s a m p l i n g ) z m = ∑ i = 1 N w i ( m ) exp ⁡ [ − α m y i G m ( x i ) ] w i ( m + 1 ) = w i ( m ) z m exp ⁡ [ − α m y i G m ( x i ) ] } f i n a l c l a s s i f i e r f ( x ) = ∑ m = 1 M α m G m ( x ) G ( x ) = s i g n ( f ( x ) ) Initialization \ sampling \\ \ \ \ \ \ \ \ \ D_1 = (w_1^{(1)}, w_2^{(1)}...w_n^{(1)}) \\ for(m=1 \sim M) \\ \{ \\ \ \ \ \ \ \ \ \ train \ the \ classifier: \\ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ G_m(x) = \pm 1 \ \\ \ \ \ \ \ \ \ \ compute \ weighted \ error \ rate: \\ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ e_m = P(G_m(x_i) \not= y_i) = \sum\limits_{i=1}^N w^{(m)}_i \\ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \alpha_m = \frac12 \log \frac{1-e_m}{e_m} \\ \ \ \ \ \ \ \ \ update \ weight \ distribution(aka \ resampling) \\ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ z_m = \sum\limits^N_{i=1}w_i^{(m)}\exp[-\alpha_my_iG_m(x_i)] \\ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ w_i^{(m+1)} = \frac{w_i^{(m)}}{z_m}\exp[-\alpha_my_iG_m(x_i)] \\ \} \\ final \ classifier \\ \ \ \ \ \ \ \ \ f(x) = \sum\limits_{m=1}^M\alpha_mG_m(x) \\ \ \ \ \ \ \ \ \ G(x) = sign(f(x)) Initialization sampling D1=(w1(1),w2(1)...wn(1))for(m=1∼M){ train the classifier: Gm(x)=±1 compute weighted error rate: em=P(Gm(xi)=yi)=i=1∑Nwi(m) αm=21logem1−em update weight distribution(aka resampling) zm=i=1∑Nwi(m)exp[−αmyiGm(xi)] wi(m+1)=zmwi(m)exp[−αmyiGm(xi)]}final classifier f(x)=m=1∑MαmGm(x) G(x)=sign(f(x))

Data scientists have proved that adaboost can reduce the error rate on training set and at least make the error rate on validation set grow not too fast.

AdaBoost Algorithm 自适應提升AdaBoost Algorithm 自适應提升

AdaBoost Algorithm 自适應提升

Intro

Adaboost (for binary classification)

繼續閱讀

Machine Learning Foundation Lecure 01 The Learning Problem 學習筆記

Coursera Machine Learning 學習筆記（十）

Coursera Machine Learning 學習筆記（二）

Coursera Machine Learning 學習筆記（十一）

Coursera Machine Learning 學習筆記（四）

Coursera Machine Learning 學習筆記（九）

誤差逆傳播計算

Seq2Seq模型中的貪心搜尋(Greedy Search)1. 引入2. 貪心算法：Greedy Search參考

機器學習的應用場景

機器學習 Machine Learning

Machine Learning Basis

Text Recognition with ML KitText Recognition with ML Kit

【吳恩達機器學習筆記】7支援向量機12支援向量機（Support Vector Machines）

scikit-learn中的SVM

ML - 貸款使用者逾期情況分析6 - Final思路

SVM支援向量機二（Lagrange Duality）SVM支援向量機二（Lagrange Duality）