The General Linear Model for fMRI analyses

```Overview of SPM
Image time-series
Realignment
Kernel
Design matrix
Smoothing
General linear model
Statistical parametric map (SPM)
Statistical
inference
Normalisation
Gaussian
field theory
p <0.05
Template
Parameter estimates
The General Linear Model (GLM)
Frederike Petzschner
Translational Neuromodeling Unit (TNU)
Institute for Biomedical Engineering, University of Zurich & ETH Zurich
With many thanks for slides & images to:
FIL Methods group, Virginia Flanagin and Klaas Enno Stephan
Image a very simple experiment…
• One session
• 7 cycles of rest and listening
• Blocks of 6 scans with 7 sec TR
time
Image a very simple experiment…
What we measure.
single voxel
time series
What we
know.
time
Question: Is there a change in the BOLD response between listening and rest?
Image a very simple experiment…
What we measure.
single voxel
time series
What we
know.
time
Question: Is there a change in the BOLD response between listening and rest?
You need a model of your data…
linear model

effects
estimate
error
estimate
statistic
Question: Is there a change in the BOLD response between listening and rest?
Time
=1
BOLD signal
Single voxel regression
model:
+ 2
x1
+
x2
y = x1b1 + x2 b2 + e
error
as a combination of experimental manipulation,confounds and errors
e
regresso
r
Time
=1
BOLD signal
Single voxel regression
model:
+ 2
x1
+
x2
y =yx
+x2b2e+ e
1b1X
error
as a combination of experimental manipulation,confounds and errors
e
n
2
+
p
1
+
n
n
1
n: number of scans
p: number of regressors
1
error
error
=
Designmatrix
The black and white version in SPM
p
y  X  e
e
1
Model assumptions
Designmatrix
error
The design matrix embodies all available knowledge
about experimentally controlled factors and potential
confounds.  Talk: Experimental Design Wed 9:45 – 10:45
You want to estimate your parameters  such that you
minimize:
N
åe
2
t
t=1
This can be done using an Ordinary least squares
estimation (OLS) assuming an i.i.d. error:
e » N(0, s 2 I)
error
GLM assumes identical and
independently distributed
errors
i.i.d. = error covariance is a scalar multiple of the identity matrix: Cov(e) = 2I
non-identity
t1
t2
1 0
Cov(e)  

0 1 
t1
t2
 4 0
Cov(e)  

0
1


non-independence
2 1
Cov(e)  

1
2


How to fit the model and estimate the parameters?
=
y
X
+
error
 1 
 
 2
„Option 1“: Per hand
e
How to fit the model and estimate the parameters?
OLS (Ordinary Least Squares)
 1 
 
 2
=
y
X
+
error
yˆ = X bˆ
Data predicted by our model
e = y - yˆ
Error between predicted and
actual data
e = y - X bˆ
Goal is to determine the
betas such that we minimize
min(eT e) = min((y - X bˆ )T (y - X bˆ ))
e
OLS (Ordinary Least Squares)
T
T
ˆ
e e = (y - X b ) (y - X bˆ )
The goal is to minimize
between data and model
T
T
ˆ
e e = (y - b X )(y - X bˆ )
T
T
T
T
T
T
T
ˆ
ˆ
ˆ
e e = y y - y X b - b X y + b X X bˆ
T
T
T
T
T
T
ˆ
ˆ
e e = y y - 2 b X y + b X X bˆ
T
T
¶e e
T
T
= -2X y + 2X X bˆ
¶bˆ
T
T
0 = -2X y + 2X X bˆ
T
T
-1 T
ˆ
b = (X X) X y
OLS (Ordinary Least Squares)
T
T
ˆ
e e = (y - X b ) (y - X bˆ )
The goal is to minimize
between data and model
T
T
ˆ
e e = (y - b X )(y - X bˆ )
T
T
T
T
T
T
T
ˆ
ˆ
ˆ
e e = y y - y X b - b X y + b X X bˆ
T
T
T
T
T
T
ˆ
ˆ
e e = y y - 2 b X y + b X X bˆ
T
T
¶e e
T
T
= -2X y + 2X X bˆ
¶bˆ
T
T
0 = -2X y + 2X X bˆ
T
T
-1 T
ˆ
b = (X X) X y
OLS (Ordinary Least Squares)
T
T
ˆ
e e = (y - X b ) (y - X bˆ )
The goal is to minimize
between data and model
e e = (y - bˆ X )(y - X bˆ )
T
T
T
T
T
T
T
ˆ
ˆ
ˆ
e e = y y - y X b - b X y + b X X bˆ
T
T
T
T
T
T
T
T
ˆ
ˆ
e e = y y - 2 b X y + b X X bˆ
T
T
¶e e
T
T
= -2X y + 2X X bˆ
¶bˆ
T
T
0 = -2X y + 2X X bˆ
T
T
-1 T
ˆ
b = (X X) X y
This is a scalar and the
transpose of a scalar is
a scalar 
OLS (Ordinary Least Squares)
T
T
ˆ
e e = (y - X b ) (y - X bˆ )
The goal is to minimize
between data and model
e e = (y - bˆ X )(y - X bˆ )
T
T
T
T
T
T
T
ˆ
ˆ
ˆ
e e = y y - y X b - b X y + b X X bˆ
T
T
T
T
T
T
T
T
ˆ
ˆ
e e = y y - 2 b X y + b X X bˆ
T
T
¶e e
T
T
= -2X y + 2X X bˆ
¶bˆ
T
T
0 = -2X y + 2X X bˆ
T
T
-1 T
ˆ
b = (X X) X y
This is a scalar and the
transpose of a scalar is
a scalar 
OLS (Ordinary Least Squares)
T
T
ˆ
e e = (y - X b ) (y - X bˆ )
The goal is to minimize
between data and model
e e = (y - bˆ X )(y - X bˆ )
T
T
T
T
T
T
T
ˆ
ˆ
ˆ
e e = y y - y X b - b X y + b X X bˆ
T
T
T
T
This is a scalar and the
transpose of a scalar is
a scalar 
T
T
T
T
ˆ
ˆ
e e = y y - 2 b X y + b X X bˆ
T
T
¶e e
T
T
= -2X y + 2X X bˆ
¶bˆ
T
T
0 = -2X y + 2X X bˆ
T
T
-1 T
ˆ
b = (X X) X y
You find the extremum
of a function by taking
its derivative and
setting it to zero
OLS (Ordinary Least Squares)
T
T
ˆ
e e = (y - X b ) (y - X bˆ )
The goal is to minimize
between data and model
e e = (y - bˆ X )(y - X bˆ )
T
T
T
T
T
T
T
ˆ
ˆ
ˆ
e e = y y - y X b - b X y + b X X bˆ
T
T
T
T
This is a scalar and the
transpose of a scalar is
a scalar 
T
T
T
T
ˆ
ˆ
e e = y y - 2 b X y + b X X bˆ
T
T
¶e e
T
T
= -2X y + 2X X bˆ
¶bˆ
T
T
0 = -2X y + 2X X bˆ
T
T
-1 T
ˆ
b = (X X) X y
You find the extremum
of a function by taking
its derivative and
setting it to zero
A geometric perspective on the GLM
OLS estimates
y
ˆ  ( X T X ) 1 X T y
e
x2
yˆ  Xˆ
x1
Design space
defined by X
Correlated and orthogonal regressors
y
Design space
defined by X
x2*
x2
x1
y  x11  x2  2  e
y  x11  x2* 2*  e
1   2  1
1  1;  2*  1
Correlated regressors =
explained variance is shared
between regressors
When x2 is orthogonalized with
regard to x1, only the parameter
estimate for x1 changes, not that
for x2!
We are nearly there…
linear model
effects
estimate

=
error
estimate
+
statistic
What are the problems?
Design
Error
1.
BOLD responses have a delayed and dispersed form.
2.
The BOLD signal includes substantial amounts of lowfrequency noise.
3.
The data are serially correlated (temporally autocorrelated)

this violates the assumptions of the noise model in
the GLM
Problem 1: Shape of BOLD response
t
f  g (t ) 

f ( ) g ( t   ) d 
0
The response of a linear time-invariant (LTI) system is the convolution of the input
with the system's response to an impulse (delta function).
Solution: Convolution model of the BOLD response
expected BOLD response
= input function impulse
response function (HRF)
t
f  g (t ) 

f ( ) g ( t   ) d 
0
 HRF
blue =
data
green = predicted response, taking convolved with HRF
red =
predicted response, NOT taking into account the HRF
Problem 2: Low frequency noise
blue =
data
black = mean + low-frequency drift
green = predicted response, taking into account
low-frequency drift
red =
predicted response, NOT taking into
account low-frequency drift
Problem 2: Low frequency noise
Linear model
blue =
data
black = mean + low-frequency drift
green = predicted response, taking into account
low-frequency drift
red =
predicted response, NOT taking into
account low-frequency drift
Solution 2: High pass filtering
discrete cosine
transform (DCT) set
Problem 3: Serial correlations
i.i.d
t1
t2
é 1 0 ùt1
Cov(e) = ê
ú
ë 0 1 ût2
non-independence
non-identity
é 2 1 ù
é 4 0 ù
ú
Cov(e) = ê
ú Cov(e) = ê
ë 1 2 û
ë 0 1 û
n
Cov(e)
n
n: number of scans
Problem 3: Serial correlations
et  aet 1   t
with
 t ~ N (0,  )
2
1st order autoregressive process: AR(1)
autocovariance
function
n
Cov(e)
n
n: number of scans
Problem 3: Serial correlations
• Pre-whitening:
1. Use an enhanced noise model with multiple error covariance
components, i.e. e ~ N(0, 2V) instead of e ~ N(0, 2I).
2. Use estimated serial correlation to specify filter matrix W for whitening the
data.
This is i.i.d
Wy  WX   We
How do we define W ?
• Enhanced noise model
• Remember linear transform
for Gaussians
• Choose W such that error
covariance becomes spherical
• Conclusion: W is a simple function of V
 so how do we estimate V ?
e ~ N (0,  V )
2
x ~ N (  ,  ), y  ax
2
 y ~ N (a , a 2 2 )
We ~ N (0,  W V )
2
 W 2V  I
W V
Wy  WX   We
1 / 2
2
Find V: Multiple covariance components
V  Cov( e)
e ~ N (0,  V )
2
enhanced noise model
V
= 1
V   iQi
error covariance components Q
and hyperparameters 
Q1
+ 2
Estimation of hyperparameters  with EM (expectation
maximisation) or ReML (restricted maximum likelihood).
Q2
We are there…
linear model
effects
estimate

error
estimate
statistic
• GLM includes all known experimental effects and confounds
• Convolution with a canonical HRF
=
+
• High-pass filtering to account for low-frequency drifts
• Estimation of multiple variance components (e.g. to account for serial correlations)
We are there…
c=10000000000
linear
model
effects
estimate

error
estimate
Null hypothesis:
1  0
statistic
c ˆ
t
Std (cT ˆ )
T
 Talk: Statistical Inference and design efficiency. Next Talk
=
+
We are there…
• Mass-univariate
approach:
GLM applied to >
100,000 voxels
single voxel
time series
• Threshold of
p<0.05  more
than 5000 voxels
significant by
chance!
• Massive problem with
multiple comparisons!
• Solution: Gaussian
random field theory
Outlook: further challenges
•correction for multiple comparisons
 Talk: Multiple Comparisons Wed 8:30
– 9:30
•variability in the HRF across voxels
 Talk: Experimental Design Wed 9:45
– 10:45
•limitations of frequentist statistics
 Talk: entire Friday
•GLM ignores interactions among voxels  Talk: Multivariate Analysis Thu 12:30 – 13:30
Thank you!
• Friston, Ashburner, Kiebel, Nichols, Penny (2007)
Statistical Parametric Mapping: The Analysis of Functional Brain
Images. Elsevier.
• Christensen R (1996) Plane Answers to Complex Questions: The Theory of
Linear Models. Springer.
• Friston KJ et al. (1995) Statistical parametric maps in functional imaging: a
general linear approach. Human Brain Mapping 2: 189-210.
```