Lecture 9 slides

Lecture 9. Model Inference
and Averaging
Instructed by Jinzhu Jia
Bootstrap and ML method
Bayesian method
EM algorithm
MCMC (Gibbs sampler)
General model average
The Bootstrap and ML
One Example with one dim data:
Cubic Spline model:  =   =
 ℎ  + ,
Let  = 1 , 2 , … , 7 be the basis matrix
Prediction error:
One Example
Bootstrap for the above
1. Draw B datasets with each of size N = 50 with
2. For each data set Z*, we fit a cubic spline
3. Using B = 200 bootstrap samples, we can obtain
95% confidence bands at each x_i
Non-parametric bootstrap
Parametric bootstrap:
The process is repeated B times, say B = 200
The bootstrap data sets:
Conclusion: the parametric bootstrap agree with the least squares!
In general, the parametric bootstrap agree with the MLE.
ML Inference
Density function or probability mass function
Likelihood function
Loglikelihood function
ML Inference
Score function
Information Matrix:
Observed Informaion matrix:
( )
Fisher Information Matrix
Asymptotic result:
is the true parameter
Estimate for standard error of
Confidence interval:
ML Inforence
confidence region:
Example: revisit the previous smoothing example
Bootstrap V.S. ML
The advantage of bootstrap: it allows us to compute
MLE of standard errors even when no formulas are
Bayesian Methods
Two parts:
1. sampling model for our data given parameters
2. prior distribution for parameters:
Finally, we have the posterior distribution:
Bayesian methods
Differences between Bayesian methods and standard
(‘frequentist’) methods
BM uses of a prior distribution to express the
uncertainty present before seeing the data,
BM allows the uncertainty remaining after seeing the
data to be expressed in the form of a posterior
Bayesian methods: prediction
In contrast, ML method uses
future data
to predict
Bayesian methods: Example
Revisit the previous example
We first assume
Bayesian methods: Example
How to choose a prior?
Difficult in general
Sensitivity analysis is needed
EM algorithm
It is used to simplify difficult maximum likelihood
problems, especially when there are missing data.
Gaussian Mixture Model
Gaussian Mixture Model
Introduce missing variable
are unknown
Iterative method:
1. Get expectation of
2. Maximize it
Gaussian Mixture Model
EM algorithm
MCMC for sampling from
MCMC is used to draw samples from some (posterior)
Gibbs sampling -- Basic idea:
To sample from  1 , 2 , … , 
Draw 1 ∼  1 2 , … , 
Draw  ∼   1 , … , −1 , +1 , … , 
Gibbs sampler: Example
Gibbs sampling for mixtures
Bootstrap can be used to assess the accuracy of a
prediction or parameter estimate
Bootstrap can also be used to improve the estimate or
prediction itself.
Reduce variances of the prediction
is linear in data, then bagging is just itself.
Take cubic smooth spline as an example.
x fixed
Bagging is not good for 0-1 loss
Model Averaging and Stacking
A Bayesian viewpoint
Model Weights
Get the weights from BIC
Model Averaging
Frequentist viewpoint
Better prediction and less interpretability
Find a better single model.
Example: Bumping
Due May 23
1. reproduce Figure 8.2
2.reproduce Figures 8.5 and 8.6
3. 8.6 P293 in ESLII_print5

similar documents