Nonparametric Regression

```Prelude of Machine Learning 202
Statistical Data Analysis in the Computer Age (1991)
Bradely Efron and Robert Tibshirani
Agenda
•
•
•
•
•
•
Overview
Bootstrap
Nonparametric Regression
Classification and Regression Trees
Conclusion
Agenda
•
•
•
•
•
•
Overview
Bootstrap
Nonparametric Regression
Classification and Regression Trees
Conclusion
Overview
• Classical statistical methods from 1920-1950:
– Linear regression, hypothesis testing, standard
errors, confidence intervals, etc.
• New statistical methods Post 1980:
– Based on the power of electronic computation
– Require fewer distributional assumptions than
their predecessors
• How to spend computational wealth wisely?
Agenda
•
•
•
•
•
•
Overview
Bootstrap
Nonparametric Regression
Classification and Regression Trees
Conclusion
Bootstrap
• Random sample from
164 data points
• t(x) = 28.58
• How accurate is t(x)?
• A device for extending
SE to estimators other
than the mean
• Suppose t(x) is 25%
trimmed mean
Bootstrap
• Why use a trimmed mean
rather than mean(x)?
• If data is from a long-tailed
probability distribution,
then the trimmed mean can
be substantially more
accurate than mean(x)
• In practice, one does not
know a priori if the true
probability distribution is
long-tailed. The bootstrap
can help answer this
question.
Agenda
•
•
•
•
•
•
Overview
Bootstrap
Nonparametric Regression
Classification and Regression Trees
Conclusion
Nonparametric Regression
curve
at 60% compliance
• 27.72 +/- 3.08
Nonparametric Regression
• Nonparametric
Regression with loess
at 60% compliance
• 32.38 +/- ?
• i.e.
– Windowing with nearlest
20% data points
– Smooth weight function
– Weighted linear regression
• How to find SE?
Nonparametric Regression
• How to find SE?
• Bootstrap
• 32.38 +/- 5.71 with B=50
• At 60% compliance
• QR: 27.72 +/- 3.08
• NPR:
32.38 +/- 5.71
• On balance, the quadratic
estimate should probably
be preferred in this case.
• It would have to have an
unusually large bias to
undo its superiority in SE.
Agenda
•
•
•
•
•
•
Overview
Bootstrap
Nonparametric Regression
Classification and Regression Trees
Conclusion
•
Generalized Linear model:
–
–
Generalizes linear regression
Linear model related to response variable using a link function
Y = g(b0 + b1*X1 + ... + bm*Xm)
•
– Non parametric regression method
– Estimate a non parametric function for each predictor
– Combine all predictor functions to predict the dependent variable
•
Generalized Additive Model (GAM) :
– Blends properties of Additive models with generalized linear model (GLM)
– Each predictor function fi(xi) is fit using parametric or non parametric means
– Provides good fits to training data at the expense of interpretability
GAM Case Study
• Analyze survival of infants after cardiac surgery for heart
defects
• Dataset: 497 infant records
• Explanatory variables:
– Age (Days)
– Weight (Kg)
– Whether Warm-blood cardiopelgia (WBC) was applied
• WBC support data:
– Of 57 infants who received WBC procedure, 7 died
– Of 440 infants who received standard procedure, 133 died
GAM Case Study: Logistic
regression results
• Three parameter regression model
– Age, Weight: continuous variables
– WBC applied: binary variable
• Results:
– WBC has strong beneficial effect: odds ratio of 3.8:1
– Higher weight => Lower risk of death
– Age has no significant effect
GAM Case Study: GAM Analysis
• Add three individual
smooth functions
– Use locally weighted scatter
plot smoothing (Loess)
method
• Results:
– WBC has strong beneficial
effect: odds ratio of 4.2:1
– Lighter infants have 55 times
more likely to die than
heavier infants
– Surprising findings from log
odds curve for age !
GAM Case Study: Conclusion
• Traditional regression models may lead to oversimplification
– Linear logistic regression forces curves to be straight lines
– Vital information regarding effect of age lost in a linear model
– More acute problem with large number of explanatory variables
• GAM analysis exploits computational power to achieve new
level of analysis flexibility
– A Personal computer can do what required a Mainframe 10 years ago
Agenda
•
•
•
•
•
•
Overview
Bootstrap
Nonparametric Regression
Classification and Regression Trees
Conclusion
Classification and Regression Tree
• A non parametric technique
• An ideal analysis method to apply computer
algorithms
• Splits based upon how well the splits can
explain variability
• Once a node is split, the procedure is applied
to each “split” recursively
CART Case study
• Gain insight into causes of duodenal ulcers
– Use sample of 745 rats
– 1 out of 56 different alkyl nucleophiles administered to each rat
– Response: One of three severity levels (1,2,3), 3 being the highest
severity
• Skewed misclassification costs
– Severe ulcer misclassification is more expensive than mild ulcer
misclassification
• Analysis tree construction:
– Use 745 observations as the training data
– Compute ‘apparent’ misclassification rates
– Training data misclassification rate has downward bias
CART Case study
• Classification tree
CART Case study: Observations
• Optimal size of classification tree is a tradeoff
– Higher training errors versus overfitting
• It is usually better to construct large tree and prune from
bottom
• How to chose optimal size classification tree ?
– Use test data on different tree models to understand misclassification
rate in each tree
– In the absence of test data, use cross validation approach
CART: Cross validation
• Mimic the use of test sample
• Standard cross validation approach:
– Divide dataset into 10 equal partitions
– Use 90% of data as training set and the remaining 10% as test data
– Repeat with all different combinations of the training and test data
• Cross validation misclassification errors found to be 10%
higher than the original
• Cross validation and bootstrapping are closely related
– Research on hybrid approaches in progress
Agenda
•
•
•
•
•
•
Overview
Bootstrap
Nonparametric Regression
Classification and Regression Trees
Conclusion
Conclusion
• Computers have enabled a new generation of
statistical methods and tools
• Replace traditional mathematical ways with computer
algorithms.
• Freedom from bell-shaped curve assumptions of the