### R for Classification

R for Classification
Jennifer Broughton
Manchester, UK
[email protected]
2 nd May 2013
Classification?
Automatic Identification
of Type (Class) of Object
from Measured Variables (Features)
Object Type
Label 1
Label 2
……
Label m
Feature1
val[1,1]
val[2,1]
…….
val[m,1]
Feature2
val[1,2]
val[2,2]
…….
val[m.2]
Feature3
val[1,3]
val[2,3]
…….
val[m,3]
…….
…….
…….
…….
…….
Feature n
val[1,n]
val[2,n]
………
val[m,n]
2 of 17
Example Data
3 of 17
Data Preparation & Investigation
EDA Technique
Training
Set
Box Plots
PCA
Decision Trees
Clustering
• Best features to distinguish
between classes
• Relationships between
features
• Feature reduction
4 of 17
Box Plots
PCA & Multivariate Analysis:
FactoMineR
5 of 17
Example Classifier
6 of 17
Classification Algorithms in R
Rattle: R Analytical Tool to Learn Easily (Rattle: A Data Mining GUI for R, Graham J Williams, The R Journal, 1(2):45-55 )
7 of 17
SVM
8 of 17
Ensemble Algorithm
9 of 17
Training and Testing
Trained
Classifier
Training Set
(labelled)
Classification
Results
Classification Algorithm:
Test Set
(unlabelled)
Neural Network
Support Vector Machine
Random Forest
Assess
Predictions:
Confusion Matrix
ROC Curve
(2 categories) ….
+ Labels
Prediction
Results
10 of 17
Using Classifiers in R
Select Training Data
Build Classifier
classifier  algorithm(formula, data, options)
(boosting and nnet)
Run Classifier
classifier.pred  predict(classifier, newdata, options)
11 of 17
SVM & Neural Net Tuning
12 of 17
Classifier Feedback
print(classifier)
plot(classifier)
high Gini Coefficient = high dispersion
13 of 17
Classifier Prediction Results
predict(type = “class”)
predict(type = “prob”)
confusion matrix
14 of 17
Binary Classification Results
Class Present?
N
Y
Y
Class
Detected?
N




True
Positive
False
Negative
False
Positive
True
Negative
=

=
+
=

=  −
+
15 of 17
ROC Curves in R
ROCR package
16 of 17
Example Results
17 of 17