Report

R for Classification Jennifer Broughton Shimadzu Research Laboratory Manchester, UK [email protected] 2 nd May 2013 Classification? Automatic Identification of Type (Class) of Object from Measured Variables (Features) Object Type Label 1 Label 2 …… Label m Feature1 val[1,1] val[2,1] ……. val[m,1] Feature2 val[1,2] val[2,2] ……. val[m.2] Feature3 val[1,3] val[2,3] ……. val[m,3] ……. ……. ……. ……. ……. Feature n val[1,n] val[2,n] ……… val[m,n] 2 of 17 Example Data 3 of 17 Data Preparation & Investigation EDA Technique Training Set Box Plots PCA Decision Trees Clustering • Best features to distinguish between classes • Relationships between features • Feature reduction 4 of 17 Box Plots PCA & Multivariate Analysis: ade4 FactoMineR 5 of 17 Example Classifier 6 of 17 Classification Algorithms in R Rattle: R Analytical Tool to Learn Easily (Rattle: A Data Mining GUI for R, Graham J Williams, The R Journal, 1(2):45-55 ) 7 of 17 SVM 8 of 17 Ensemble Algorithm 9 of 17 Training and Testing Trained Classifier Training Set (labelled) Classification Results Classification Algorithm: Test Set (unlabelled) Neural Network Support Vector Machine Random Forest Assess Predictions: Confusion Matrix ROC Curve (2 categories) …. + Labels Prediction Results 10 of 17 Using Classifiers in R Select Training Data Build Classifier classifier algorithm(formula, data, options) (boosting and nnet) Run Classifier classifier.pred predict(classifier, newdata, options) 11 of 17 SVM & Neural Net Tuning 12 of 17 Classifier Feedback print(classifier) plot(classifier) high Gini Coefficient = high dispersion 13 of 17 Classifier Prediction Results predict(type = “class”) predict(type = “prob”) confusion matrix 14 of 17 Binary Classification Results Class Present? N Y Y Class Detected? N True Positive False Negative False Positive True Negative = = + = = − + 15 of 17 ROC Curves in R ROCR package 16 of 17 Example Results 17 of 17