### Recitation 2 Slides

```RECITATION 2
APRIL 28
Spline and Kernel method
Gaussian Processes
Mixture Modeling for Density Estimation
Penalized Cubic Regression Splines
• gam() in library “mgcv”
• gam( y ~ s(x, bs=“cr”, k=n.knots) , knots=list(x=c(…)), data =
dataset)
• By default, the optimal smoothing parameter selected by GCV
• R Demo 1
Kernel Method
• locally linear polynomial model
• How to define “local”?
• By Kernel function, e.g. Gaussian kernel
• R Demo 1
• R package: “locfit”
• Function: locfit(y~x, kern=“gauss”, deg= , alpha= )
• Bandwidth selected by GCV: gcvplot(y~x, kern=“gauss”, deg= ,
alpha= bandwidth range)
Gaussian Processes
• Distribution on functions
• f ~ GP(m,κ)
• m: mean function
• κ: covariance function
• p(f(x1), . . . , f(xn)) ∼ Nn(μ, K)
• μ = [m(x1),...,m(xn)]
• Kij = κ (xi,xj)
• Idea: If xi, xj are similar according to the kernel, then f(xi)
is similar to f(xj)
Gaussian Processes
– Noise free observations
• learn a function f(x) to estimate y, from data (x, y)
• A function can be viewed as a random variable of infinite dimensions
• GP provides a distribution over functions.
Gaussian Processes
– Noise free observations
• Model
• (x, f) are the observed locations and values (training data)
• (x*, f*) are the test or prediction data locations and values.
• After observing some noise free data (x, f),
• Length-scale
• R Demo 2
Gaussian Processes – Noisy observations
(GP for Regression)
• Model
• (x, y) are the observed locations and values (training data)
• (x*, f*) are the test or prediction data locations and values.
• After observing some noisy data (x, y),
• R Demo 3
Reference
• Chapter 2 from Gaussian Processes for Machine Learning
Carl Edward Rasmussen and Christopher K. I. Williams
• 527 lecture notes by Emily Fox
Mixture Models – Density Estimation
• EM algorithm vs. Bayesian Markov Chain Monte Carlo
(MCMC)
• Remember:
• EM algorithm = iterative algorithm that MAXIMIZES
LIKELIHOOD
• MCMC DRAWS FROM POSTERIOR (i.e. likelihood+prior)
EM algorithm
• Iterative procedure that attempts to maximize log-
likelihood ---> MLE estimates of the mixture model
parameters.
• I.e. one final density estimate
Bayesian Mixture Modeling (MCMC)
• Uses an iterative procedure to DRAW SAMPLES from
posterior (then you can average draws, etc.)
• Don’t need to understand fine details but know that every
iteration you get a set of parameter estimates from your
posterior distribution.
```