Report

Boosting Rong Jin Inefficiency with Bagging Bagging D Inefficient boostrap sampling: • Every example has equal chance to be sampled • No distinction between “easy” examples and “difficult” examples Boostrap Sampling D1 D2 Dk … Inefficient model combination: • A constant weight for each classifier • No distinction between accurate classifiers and inaccurate classifiers h1 h2 i Pr(c | hi , x) hk Improve the Efficiency of Bagging Better sampling strategy • Focus on the examples that are difficult to classify Better combination strategy • Accurate model should be assigned larger weights Intuition Classifier1 + X1 X2 X3 X4 Y1 Y2 Y3 Y4 Classifier2 + X1 X3 X1 Y1 Y3 Y1 Classifier3 AdaBoost Algorithm AdaBoost Example: t=ln2 D0: h1 D1: h2 D2: x1, y1 x2, y2 x3, y3 x4, y4 x5, y5 1/5 1/5 1/5 1/5 1/5 x1, y1 x2, y2 x3, y3 x4, y4 x5, y5 1/7 1/7 2/7 1/7 2/7 x1, y1 x2, y2 x3, y3 x4 , y 4 x5, y5 2/9 1/9 1/9 4/9 1/9 Sample x1, y1 x3, y3 x5, y5 Training Update Weights Sample h1 x1, y1 x3, y3 Training Update Weights h2 Sample … How To Choose t in AdaBoost? How to construct the best distribution Dt+1(i) 1. 2. Dt+1(i) should be significantly different from Dt(i) Dt+1(i) should create a situation that classifier ht performs poorly How To Choose t in AdaBoost? Optimization View for Choosing t ht(x): x{1,-1}; a base (weak) classifier HT(x): a linear combination of basic classifiers Goal: minimize training error Approximate error swith a exponential function AdaBoost: Greedy Optimization Fix HT-1(x), and solve hT(x) and t Empirical Study of AdaBoost AdaBoosting decision trees • • Generate 50 decision trees by AdaBoost Linearly combine decision trees using the weights of AdaBoost In general: • • AdaBoost = Bagging > C4.5 AdaBoost usually needs less number of classifiers than Bagging Bia-Variance Tradeoff for AdaBoost • AdaBoost can reduce both variance and bias simultaneously variance bias single decision tree Bagging decision tree AdaBoosting decision trees