Hierarchical models, variance components and group analyses

```Group analyses of fMRI data
Klaas Enno Stephan
Laboratory for Social and Neural Systems Research
Institute for Empirical Research in Economics
University of Zurich
Functional Imaging Laboratory (FIL)
Wellcome Trust Centre for Neuroimaging
University College London
With many thanks for slides & images to:
FIL Methods group,
particularly Will Penny & Tom Nichols
Methods & models for fMRI data analysis in neuroeconomics
November 2010
Overview of SPM
Image time-series
Realignment
Kernel
Design matrix
Smoothing
General linear model
Statistical parametric map (SPM)
Statistical
inference
Normalisation
Gaussian
field theory
p <0.05
Template
Parameter estimates
Reminder: voxel-wise time series analysis!
model
specification
Time
parameter
estimation
hypothesis
statistic
BOLD signal
single voxel
time series
SPM
The model: voxel-wise GLM
p
1
1
1

p
y
N
=
N
X
y  X  e
e ~ N ( 0,  I )
2
+ e
N
Model is specified by
1. Design matrix X
N: number of scans
p: number of regressors
The design matrix embodies all available knowledge about
experimentally controlled factors and potential confounds.
GLM assumes Gaussian “spherical” (i.i.d.) errors
sphericity = iid:
error covariance is
scalar multiple of
identity matrix:
Cov(e) = 2I
Examples for non-sphericity:
 4 0
Cov(e)  

 0 1
non-identity
1 0
Cov(e)  

0
1


2 1
Cov(e)  

1
2


non-independence
Multiple covariance components at 1st level
V  Cov( e)
e ~ N (0,  V )
2
enhanced noise model
V
= 1
V   iQi
error covariance components Q
and hyperparameters 
Q1
+ 2
Q2
Estimation of hyperparameters  with ReML (restricted maximum
likelihood).
t-statistic based on ML estimates
Wy  WX  We
ˆ  (WX )  Wy
c=10000000000
c ˆ
t
T ˆ
ˆ
st d ( c  )
T
W V
stˆd (cT ˆ ) 
ˆ c (WX ) (WX ) c
1 / 2
 2V  Cov (e)

2 T
ˆ
2



T
Wy  WXˆ

2
tr ( R)
R  I  WX (WX ) 
X
V 
 Q
i
i
For brevity:
ReMLestimates
(WX )   ( X TWX ) 1 X T
Fixed vs.
random effects
analysis
• Fixed Effects
– Intra-subject variation
suggests most subjects
different from zero
• Random Effects
– Inter-subject variation
suggests population is
not very different from
zero
Distribution of
each subject’s
estimated effect
2FFX
Subj. 1
Subj. 2
Subj. 3
Subj. 4
Subj. 5
Subj. 6
0
2RFX
Distribution of
population effect
8
Fixed Effects
• Assumption: variation (over subjects) is only
due to measurement error
• parameters are fixed properties of the population
(i.e., they are the same in each subject)
Random/Mixed Effects
• Two sources of variation (over subjects)
– Measurement error
– Response magnitude: parameters are
probabilistically distributed in the population
• Response magnitude is random
– effect (parameter) in each subject has random
magnitude
Random/Mixed Effects
• Two sources of variation
– Measurement error
– Response magnitude: parameters are
probabilistically distributed in the population
• Response magnitude is random
– effect (parameter) in each subject has random
magnitude
– variation around population mean
Group level inference: fixed effects (FFX)
• assumes that parameters are “fixed properties of the
population”
• all variability is only intra-subject variability, e.g. due to
measurement errors
• Laird & Ware (1982): the probability distribution of the data
has the same form for each individual and the same
parameters
• In SPM: simply concatenate the data and the design
matrices
 lots of power (proportional to number of scans),
but results are only valid for the group studied and
cannot be generalized to the population
Group level inference: random effects (RFX)
• assumes that model parameters are probabilistically
distributed in the population
• variance is due to inter-subject variability
• Laird & Ware (1982): the probability distribution of the data
has the same form for each individual, but the parameters
vary across individuals
• hierarchical model
 much less power (proportional to number of
subjects), but results can be generalized to the
population
Hierachical models
fMRI, single subject
fMRI, multi-subject
EEG/MEG, single subject
ERP/ERF, multi-subject
Hierarchical models for all imaging
data!
Linear hierarchical model
Hierarchical model


(1 )

(2)
y  X
(1 )
 X
(2)
(1 )


(1 )
(2)
C


( n 1 )
 X
(n)

(n)

Multiple variance components
at each level
(i)

  Q
(i)
k
k
(n)
At each level, distribution of parameters
is given by level above.
What we don’t know: distribution of parameters
and variance parameters (hyperparameters).
(i)
k
Example: Two-level model
1 1
yX 

1
 1  X 2  2    2 
 1
X 1(1)
y =
 2 
+  
1
X 2(1)
 1 = X 2 
+  2 
X 3(1)
Second level
First level
Two-level model

(1)
 X
(1)
(1)

(2)
(1)
 X
(2)
(1)
y X

y X
X
X

(2)

(2)
(2)
fixed effects
Friston et al. 2002, NeuroImage
(2)



 X
(2)
(1)

(1)
(2)

(2)
(1)

(1)
random effects
Mixed effects analysis
Non-hierarchical model
y X
(1)

(2)
X
(2)
X
(1)

(2)

(1)
(1)
(1) 
ˆ  X
y
Estimating 2nd level effects
Variance components at 2nd
level

(2)

(2)

(2)

(2)
 X
(2)
 X
(2)
C ov 
(2)
C
(2)
 X
(1) 
(1) 
(1)
 X
C

(1)
X
(1) 
T
within-level
between-level
non-sphericity non-sphericity
Within-level non-sphericity at
both levels: multiple
covariance components
C
(i)
(i )
   k Qk
(i )
k
Friston et al. 2005, NeuroImage
Estimation
y  X
N 1
Np
  
p 1
EM-algorithm
N 1
1
C |y  ( X C  X )
T
 |y  C |y X C  y
T
m axim ise L  ln p ( y | λ )
1
1
dL
d
d 2L
J 2
d
    J 1 g
E-step
g
C   k Qk
k
Assume, at voxel j:
M-step
 jk   j k
Friston et al. 2002, NeuroImage
Algorithmic equivalence
Hierarchical
model


(1 )

(2)

(2)
(n)

(n)
y  X
(1 )
 X
(2)
(1 )

(1 )


( n 1 )
 X
(n)

Parametric
Empirical
Bayes (PEB)
EM = PEB = ReML
Single-level
model
y 

X
(1 )
 X 
... 
(1 )
( n 1 )
(2)

 
(1 )
(n) (n)
X X 
(1 )
X
(n)
Restricted
Maximum
Likelihood
(ReML)
Practical problems
Most 2-level models are just too big to
compute.
And even if, it takes a long time!
Moreover, sometimes we are only
interested in one specific effect and do
not want to model all the data.
Is there a fast approximation?
Summary statistics approach
First level
Data
Design Matrix
ˆ1
ˆ12
Second level
Contrast Images
t
cT ˆ
Vaˆr (cT ˆ )
SPM(t)
ˆ2
ˆ 22
ˆ11
ˆ112
ˆ12
ˆ122
One-sample
t-test @ 2nd level
Validity of the summary statistics approach
The summary stats approach is exact if for each
session/subject:
Within-session covariance the same
First-level design the same
One contrast per session
But:
Summary stats approach is fairly robust
against violations of these conditions.
Mixed effects analysis
y
X  [X
(0)
(1 )
X
data
X  [X
]
Q  {Q
V  I
Summary
statistics
non-hierarchical model
ˆ
(1 )
 (X V
T
Y  ˆ
(1 )
, , X
1
X)
1
T
X V
1
X
(1 )
(2)
Q
]
(2)
1
X
(1 ) T
, }
y
V 
T
n , X , Q}
(1 )
X  X
(2)

(1 )
i
X
(1 ) 
(1 )  T
(1 )
Qi X

i
Friston et al. 2005, NeuroImage
(1 )
1
X
Step 1
  REML { yy
EM
approach
(0)

(2)
j
(2)
Qj
j
1st level
non-sphericity
2nd level
non-sphericity
Step 2
ˆ
(2)
ˆ
 (X V
(2)
T
1
X)
1
T
X V
1
y
pooling over
voxels
Reminder: sphericity
C  Cov( )  E ( )
T
y  X  
„sphericity“ means:
Scans
Cov( )   I
2
i.e. Var ( i )  
1 0
Cov( )  

0
1


Scans
2
2nd level: non-sphericity
Error
covariance
Errors are independent
but not identical:
e.g. different groups (patients,
controls)
Errors are not independent
and not identical:
e.g. repeated measures for each
subject (multiple basis functions,
multiple conditions etc.)
2nd level: non-sphericity
y=X  +
N1
Np
p1
Cor(ε) =Σk λkQk
N1
Error covariance
• 12 subjects, 4
conditions
• Measurements btw
subjects uncorrelated
• Measurements w/in
subjects correlated
N
N
Errors can now have
different variances and
there can be correlations
Allows for ‘nonsphericity’
27
Example 1: non-identical & independent errors
Stimuli:
Auditory Presentation (SOA = 4 secs) of
(i) words and (ii) words spoken backwards
e.g.
“Book”
and
“Koob”
Subjects:
Scanning:
(i) 12 control subjects
(ii) 11 blind subjects
fMRI, 250 scans per
subject, block design
Noppeney et al.
1st level:
Controls
Blinds
2nd level:
V
cT  [1  1]
X
Example 2: non-identical & non-independent errors
Stimuli:
Subjects:
Auditory Presentation (SOA = 4 secs) of words
1. Motion
2. Sound
3. Visual
4. Action
“jump”
“click”
“pink”
“turn”
(i) 12 control subjects
1. Words referred to body motion. Subjects decided
if the body movement was slow.
Scanning:
fMRI, 250 scans per
subject, block design
Question:
What regions are generally
affected by the semantic content
of the words?
Contrast: semantic decisions >
auditory decisions on reversed
2. Words referred to auditory features. Subjects
decided if the sound was usually loud
3. Words referred to visual features. Subjects
decided if the visual form was curved.
4. Words referred to hand actions. Subjects decided
if the hand action involved a tool.
Noppeney et al. 2003, Brain
Repeated measures ANOVA
1st level:
1.Motion
2.Sound
?
=
3.Visual
?
?
=
=
X
2nd level:
4.Action
Repeated measures ANOVA
1st level:
1.Motion
2.Sound
?
3.Visual
?
?
=
4.Action
=
=
X
2nd level:
 1 1 0 0 


cT   0 1  1 0 
 0 0 1  1


V
X
Practical conclusions
• Linear hierarchical models are used for group analyses of multisubject imaging data.
• The main challenge is to model non-sphericity (i.e. non-identity
and non-independence of errors) within and between levels of
the hierarchy.
• This is done by estimating hyperparameters using EM or ReML
(which are equivalent for linear models).
• The summary statistics approach is robust approximation to a
full mixed-effects analysis.
– Use mixed-effects model only, if seriously in doubt about validity of
summary statistics approach.