2nd Level GLM

Report
nd
2
Level GLM
Emily Falk, Ph.D.
1
(De-noise)
Slice Timing Correct
Predictors
Realign
Smooth
Acquire
Functional
s
Y
Determine Scanning
Parameters
CoRegister
X
y = Xβ + ε
1st level
(Subject
) GLM
β
Acquire
Structurals
(T1)
βhow - βwhy
Normalize
Contrast
Template
2nd level
(Group)
GLM
Threshold
2
Groups of Subjects
• So far: Analyzing each individual voxel from one
person
• How do subjects combine data from groups of
subjects?
– Often referred to as 2nd-level random effects
analysis
• Basic approach:
– Normalize SPMs from each subject into a
standard space
– Test whether statistic from a given voxel is
significantly different from 0 across subjects
– Correct for multiple comparisons
3
An Example
-
4
Region A: β1-β2
(repeat for all regions)
Subject 1:
32
Subject 2:
18
Subject 3:
-4
Subject 4:
45
Subject 5:
23
Mean :
22.8 *
5
(Aron et al., 2005)
Fixed and Random Effects
• Fixed effect
– Always the same, from experiment to experiment, levels
are not draws from a random variable
– Sex (M/F)
– Drug type (Prozac)
• Random effect
– Levels are not randomly sampled from a population
– Subject
– Day, in a longitudinal design
• If effect is treated as fixed, error terms in model do not
include variability across levels
– Cannot generalize to unobserved levels
– e.g., if subject is fixed, cannot generalize to new
subjects
6
Courtesy of Tor Wager
Fixed vs. Random Effects: Bottom Line
• If I treat subject as a fixed effect, the
error term reflects only scan-to-scan
variability, and the degrees of freedom
are determined by the number of
observations (scans).
• If I treat subject as a random effect, the
error term reflects the variability across
subjects, which includes two parts:
– Error due to scan-to-scan variability
– Error due to subject-to-subject variability
and degrees of freedom are determined by the
number of subjects.
7
Courtesy of Tor Wager
Random Effects Analysis
• Subjects treated as “random” effect
– Randomly sampled from population of interest
• Sample is used to make estimates of
population effects
• Results lead to inferences on the
population
8
Random vs. Fixed Effects
• Whereas some early studies used
fixed effects models, virtually all
current studies use random effects
models
• Use random effects
• All analysis that follow treat subject as
a random effect
9
More specifically…
10
Voxel-Wise 2nd-Level Analysis
Model
specification
Subject
Parameter
estimation
single voxel
subj series
Hypothesis
Statistic
Statistic at
that voxel
SPM
11
Model Specification:
Building the Design Matrix
Y = Xb + e
éY1 ù
é1ù
é e1 ù
ê ú
êú
ê ú
Y
1
ê 2 ú = ê ú ´ [ b ] + êe 2 ú
0
ê ú
êú
ê ú
ê ú
êú
ê ú
Y
1
ë nû
ëû
ëe n û
Subjects
Stat Value
Design matrix
Residuals
Model
parameters
=
X
intercept
[ b0 ] +
12
Parameter Estimation/Model Fitting
Find  values
that produce
best fit to
observed data
y
=
 0
+
ERROR
13
The SPM Way of Plotting the Variables
y
X
=
e
X
[ b0 ]
+
14
Group Analysis Using Summary Statistics:
A simple kind of ‘random effects’ model
The “Holmes and Friston” approach (HF)
First level
Data
Design Matrix
Second level
Contrast Images
SPM(t)
One-sample
t-test @ 2nd level
15
Courtesy of Tor Wager
Summary Statistic Approach: 2 Sample t-test
from Mumford & Nichols, 2006
16
Summary Statistic Approach: Inference
• In a 1-sample t-test, the contrast C = 1 derives
the group mean
– If images taken to a second level represent the contrast A
– B, then
• C = 1 is the mean difference (A > B)
• C = -1 is the mean difference (B > A)
• Dividing by the standard error of the mean yields a tstatistic
– Degrees of freedom is N – 1, where N is the number of
subjects
• Comparison of the t-statistic with the t-distribution
yields a p-value
– P(DataNull)
17
Tech Note: Sufficiency of Summary Statistic
Approach
• With simple t-tests under the summary statistic approach, withinsubject variance is assumed to be homogenous (within a group)
– SPM’s approach, but other packages can act differently
• If all subjects (within a group) have equal within-subject variance
(homoscedastic), this is ok
• If within-subject variance differs among subjects (heteroscedastic),
this may lead to a loss of precision
– May want to weight individuals as a function of within-subject variability
• Practically speaking, the simple approach is good enough (Mumford
& Nichols, 2009, NeuroImage)
–
–
–
–
Inferences are valid under heteroscedasticity
Slightly conservative under heteroscedasticity
Near optimal sensitivity under heteroscedasticity
Computationally efficient
18
• For extended example of ways that
you could do this wrong, check out
Derek Nee’s second level GLM
lecture from last year
19
The GLM Family
DV
One continuous
Repeated
measures
Predictors
Analysis
Continuous
One predictor
Continuous
Two+ preds
Categorical
1 pred., 2 levels
Categorical
1 p., 3+ levels
Categorical
2+ predictors
Two measures,
one factor
Regression
More than two
measures
Multiple
Regression
2-sample t-test
One-way
ANOVA
Factorial
ANOVA
General
Linear
Model
Paired t-test
Repeated
measures ANOVA
20
Correlations
• To perform mass bi-variate correlations, use
SPM’s “Multiple Regression” option with a single
co-variate
– Can also specify multiple co-variates and perform true
multiple regression
• Be cautious of multi-collinearity!
• Correlations are done voxel-wise
• % of explained variance necessary to reach
significance with appropriate correction for multiple
comparisons may be very high
– Interpret location, not effect size (more later)
• May be more realistic to perform correlations on a
small set of regions-of-interest (more later)
21
Examples
• First level: Why > How
– Regression with…
•
•
•
•
trait empathy
trait narcissism
scan on weekday or weekend
friends on facebook
• First level: Loved one > Other
– Regression with…
• relationship closeness
• relationship satisfaction
• age
22
Example
• Costly exclusion predicts susceptibility to
peer influence
23
Falk et al., 2013,
JAH
Correlations and Outliers
Null-hypothesis data, N = 50
Same data, with one outlier
24
Courtesy of Tor Wager
Robust Regression
• Outliers can be problematic, especially for
correlations
• Robust regression reduces the impact of outliers
– 1) Weight data by inverse of leverage
– 2) Fit weighted least squares model
– 3) Scale and weight residuals
– 4) Re-fit model
– 5) Iterate steps 2-4 until convergence
– 6) Adjust variances or degrees of freedom
for p-values
• Can be applied to simple group results or
correlations
– Whole brain: http://wagerlab.colorado.edu/
– ROI: whatever software you prefer (more later)
25
Null-hypothesis data, N = 50
Same data, with one outlier
Robust IRLS solution
26
Courtesy of Tor Wager
Case Study: Visual Activation
Visual responses
27
Courtesy of Tor Wager
(De-noise)
Slice Timing Correct
Predictors
Realign
Smooth
Acquire
Functional
s
Y
Determine Scanning
Parameters
CoRegister
X
y = Xβ + ε
1st level
(Subject
) GLM
β
Acquire
Structurals
(T1)
βhow - βwhy
Normalize
Contrast
Template
2nd level
(Group)
GLM
Threshold
28
(De-noise)
Slice Timing Correct
Predictors
Realign
Smooth
Acquire
Functional
s
Y
Determine Scanning
Parameters
CoRegister
X
y = Xβ + ε
1st level
(Subject
) GLM
β
Acquire
Structurals
(T1)
βhow - βwhy
Normalize
Contrast
Template
2nd level
(Group)
GLM
Threshold
29
Up Next…
• Hypothesis Testing
• Levels of Inference
• Multiple Comparisons
– Family-wise Error Correction
– False-Discovery Rate Correction
– Non-parametric Correction
30
Hypothesis Testing
•
Null Hypothesis H0
– No effect
• T-test: No difference from zero
• F-test: No variance explained
•
α level
– Set to an acceptable false positive rate
– Level α = P( T > μα | H0)
– Threshold μα controls false positive rate at
level α
•
P-value
– Test statistics are compared with appropriate
distributions
• Changes as a function of degrees of
freedom
• T-distribution: bell-shaped
• F-distribution: skewed
– Assessment of probability of test statistic
assuming H0
– P(Data | Null)
• But not P(Null | Data)!
31
Information for Making Inferences on Activation
• Where? Signal location
– Local maximum – no inference in SPM
• Could extract peak coordinates and test
(e.g., Woods lab, Ploghaus, 1999)
• How strong? Signal magnitude
– Local contrast intensity – Main thing tested in SPM
• How large? Spatial extent
– Cluster volume – Can get p-values from SPM
• Sensitive to blob-defining-threshold
• When? Signal timing
– No inference in SPM; but see Aguirre 1998; Bellgowan 2003;
Miezin et al. 2000, Lindquist & Wager, 2007
32
Unit of Analysis
• Fundamental unit of analysis is voxel
– GLM is run voxel-by-voxel
– Statistical parametric maps (SPM’s) are
calculated voxel-by-voxel
• Unit of interest may instead by a “region”
– Functional unit
– Pool data across voxels
• May also be broadly interested in the brain as
a whole
– Considering the brain as a whole, do these 2
conditions differ?
33
Levels of Inference
• Inferences can be made at any “level”
depending upon your unit of interest
• Voxel-level
– This/these particular voxels are
significant
– Most spatially specific, least sensitive
• Cluster-level
– These contiguous voxels together are
significant
– Less spatially specific, more sensitive
• Set-level
– The brain shows an effect
– No spatial specificity, but can be most
sensitive
SPM’s results table shows pvalues for voxel-level, clusterlevel, and set-level tests.
34
Voxel-Level Inference
• Retain voxels above α-level threshold uα
• Gives best spatial specificity
– The null hyp. at a single voxel can be rejected
uα
space
Significant
Voxels
No significant
Voxels
35
Courtesy of Tor Wager
Cluster-Level Inference
• Two step-process
– Define clusters by arbitrary threshold uclus
– Retain clusters larger than α-level threshold kα
uclus
space
Cluster not
significant
kα
kα
Cluster
significant
36
Courtesy of Tor Wager
Cluster-Level Inference
• Typically better sensitivity
• Worse spatial specificity
– The null hyp. of entire cluster is rejected
– Only means that one or more voxels in cluster
active
uclus
space
Cluster not
significant
kα
kα
Cluster
significant
37
Courtesy of Tor Wager
Multiple Comparisons Problem
• Often over 100,000 voxels in the brain
– Voxel-level tests are repeated over 100,000
times
– If α = 0.05 (i.e. p < 0.05), over 5,000 false
positive voxels!
• Need to control false positive rate at α
across all tests
– Otherwise, difficult to know if result is
believable
38
Multiple Comparisons
• Perform statistical tests at every voxel tens
and tends of thousands
• Quite likely that some would pass
threshold by chance even if there was
absolutely no effect
• Need to correct for multiple comparisons.
39
Some Approaches
• Bonferroni correction: Insist on p<.05/#voxels
– Severely reduces sensitivity, but works with small ROIs
• Gaussian random field theory: Suppose there is no effect
but data is spatially smooth. What’s the chance of seeing
a blob of X contiguous voxels all of which are above a
threshold V?
– Default approach to controlling familywise error (FWE) in SPM
• False Discovery Rate (FDR): Set threshold so that less
than 5% of the voxels above threshold would be false
positives under null hypothesis
40
Family-Wise Error (FEW)
• FWE-rate is the probability of finding one or more
false positives among all hypothesis tests
– If FWEα = 0.05, probability of finding one or more false
positives is 5%
• Based on maximum distribution
– If no true positives are present, most significant voxel
will exceed the threshold 5% of the time
• Several approaches to control FWE
41
Bonferroni Correction
• Simplest method for
controlling FWE
• αcorrected = α/V
– α is the desired alpha level
– αcorrected is the alpha level
corrected for FWE
– V is the number of
voxels/tests
When examining results in SPM,
you can find the # of voxels in the
statistics table (bottom of the table
under Volume). Divide α by the #
of voxels to determine a Bonferroni
corrected threshold.
• 0.05/100,000 = 0.0000005
– E.g. t(20) = 6.93
42
Bonferroni Correction: Limitations
• Correction assumes that each test is
independent
• Data are actually spatially smooth, so not
independent!
• Correction tends to be overly conservative
– False positives appropriately controlled
– But threshold is too high to detect many true
positives
43
Gaussian Random Fields
• SPM’s default method of FWE correction takes into account
smoothness of data
• Intuition
– Smooth data  lower the resolution of the search space 
fewer comparisons  less stringent correction
• Assumes that an image of residuals can be descripbed by
Gaussian noise convolved with a 3D kernel
– Forms a Gaussian Random Field
– FWHM of the kernel describes the smoothness of the data
44
Tech Note – Estimating Smoothness: RESELS
• RESELS = RESolution Elements
– 1 RESEL = FWHMx x FWHMy x FWHMz
1
2
1
3
4
2
5
6
7
8
3
9
10
4
voxels
RESELS
Note, when examining results
in SPM, you can find the # of
resels and FWHM in the
statistics table (bottom of the
table under Volume)
45
• Threshold needed to correct
– Increases with greater search volume
• Need more stringent correction
– Decreases with greater smoothness, RESEL
• Greater smoothness leads to less stringent
correction
46
Gaussian Random Fields: Clusters
1) Threshold at voxel-level
5mm FWHM
2) Estimate chance of clusters of size ≥ k,
taking into account
Mean expected cluster size
search volume
smoothness
Threshold
-> puncorrected of cluster of size ≥k
3) Apply previously described correction
pcorrected * z2
10mm FWHM
15mm FWHM
47
Courtesy of Tor Wager
Gaussian Random Fields: Limitations
•
Requires sufficient smoothness of data
– FWHM 3-4x voxel size
•
Performs poorly with low df
– Better with df > 20 and sufficient smoothness
•
Tends to be conservative, especially with
rough data (FWHM < 6)
•
Based on approximations
– Approximations can be thrown off by
“roughness spikes”
– Approximations will vary on a contrast by
contrast basis
•
•
Different contrasts in same data will have
different thresholds
Typically regarded as better at individual
level where df are high
Select “FWE” in SPM results to
threshold using Gaussian
Random Field Theory. Expect
a conservative threshold.
48
False-Discovery Rate
• Correction of FWE ensures that false positives will
be controlled per family of tests
– αFWE-corrected = 0.05, 5% of contrasts (across all voxels)
will have a single false positive
• False-Discovery Rate (FDR) controls the number
of false positives within a family of tests
– αFDR-corrected = 0.05, 5% of reportedly active voxels in a
contrast will be false positives
• Upside: will find more true signal
• Downside: will have a few false positives
49
3.
– V is the # of voxels
• In other words
– Smallest p-value must pass
Bonferroni, second smallest
Bonferroni*2, third smallest
Bonferroni*3, etc
• i.e. i = 1: Bonferroni*1, i = 2:
Bonferroni*2, i = 3: Bonferroni*3, etc.
p(i)
p-value
2.
Establish a rate, q, of acceptable
proportion of false-positives (e.g.
0.05)
Sort observed p-values from
smallest to largest
Find max(i) such that Pi < i*(q/V)
(i/V)q
0
1.
1
False-Discovery Rate: Method
0
i/V
1
– Highest such i gives threshold
– If no p-value passes, threshold
cannot be determined (SPM will say
the threshold is t = infinity)
50
False-Discovery Rate: Limitations
• Limits inference
– Cannot say which activated voxels
are true positives or false positives
• Adaptive
– Good in some cases
– Maps with lots of activations (many
voxels with low p) will have low
thresholds
– Maps with little activation (few
voxels with low p) with have high
thresholds or no determinable
threshold
• Hard to find signal in small areas
Select “FDR” in SPM
results to threshold using
False-Discovery Rate.
Threshold may be “infinity”
if effects are weak or it may
be very low if results are
strong.
51
Simulations
Noise
Signal
Signal+Noise
52
Courtesy of Tor Wager
Control of Per Comparison Rate at 10%
11.3%
11.3% 12.5% 10.8% 11.5% 10.0% 10.7% 11.2% 10.2%
Percentage of Null Pixels that are False Positives
9.5%
Control of Familywise Error Rate at 10%
FWE
Occurrence of Familywise Error
Control of False Discovery Rate at 10%
6.7%
10.4% 14.9% 9.3% 16.2% 13.8% 14.0% 10.5% 12.2%
Percentage of Activated Pixels that are False Positives
8.7%
53
Nonparametric Inference
• Parametric methods
– Assume distribution of
statistic under null
hypothesis
– Needed to find P-values, u
5%
Parametric Null Distribution
• Nonparametric methods
– Use data to find
distribution of statistic
under null hypothesis
– Any statistic!
5%
Nonparametric Null Distribution
54
Courtesy of Tor Wager
Permutation Test: Toy Example
• Under H0
– Consider all equivalent relabelings
– Compute all possible statistic values
– Find 95%ile of permutation distribution
-8
-4
0
4
8
55
Courtesy of Tor Wager
Permutation Test: Details
• Requires only assumption of exchangeability:
– Under H0, distribution unperturbed by permutation
• Subjects are exchangeable (good for group
analysis)
– Under H0, each subject’s A/B labels can be flipped
• fMRI scans not exchangeable under H0 (bad for
time series analysis/single subject)
– Due to temporal autocorrelation
On the SPM website, click on
Extensions and search under
Toolboxes to download
SnPM
56
Courtesy of Tom Nichols
Other Approaches
• AlphaSim (AFNI): create random fields based on
smoothness and observe rate of false positive clusters
– Provides cluster-level correction
– Can input appropriate voxel-level threshold and clusterextent to use with SPM
• Once estimated, threshold can be used for all contrasts
• Threshold-free Cluster Enhancement (FSL): combines
signal strength and voxel extent into a single measure
– Provides cluster-level correction without need to first
specify voxel-level threshold
– Avoids ambiguities that can arise from getting different
results at different voxel-level thresholds with other cluster
methods
57
Alpha Sim
• FDR correction based on simulation…
– “All whole-brain analyses were thresholded using an uncorrected
p-value of .001 combined with an extent threshold of 21
contiguous voxels, corresponding to a false-positive discovery
rate of 5% across the whole brain as estimated by Monte Carlo
simluation implemented using AlphaSim in the software package
AFNI (http://afni.nimh.gov/afni/doc/manual/AlphaSim)”.
– “Whole-brain analyses were conducted using a statistical
criterion of at least 21 contiguous voxels exceeding a voxel-wise
threshold of p <.001. A Monte Carlo simulation
(http://afni.nimh.gov/afni/doc/manual/AlphaSim) of our brain
volume demonstrated that this cluster extent cutoff provided an
experiment-wise threshold of p <.05, corrected for multiple
comparisons.”
– Cox, R. W. (1996). ANFI: Software for analysis and visualization
of functional magnetic resonance neuroimages. Comput Biomed
Res 29, 162-173.
58
AlphaSim: Which volume?
Analysis mask
Cube
59
AlphaSim: Usage
> AlphaSim –nxyz [nx] [ny] [nz] –dxyz [dx] [dy] [dz] …
–fwhm 8 –pthr .001 –iter 10000 -quiet
Where:
nx, ny, nz = resolution in each dimension (e.g. 64 64 34)
dx, dy, dz = size of voxels in mms (e.g. 3 3 3)
fwhm = Size of smoothing kernel (mm)
pthr = voxel-level threshold
Iter = number of simulations to run (usually 10k)
quiet = suppress output
60
AlphaSim: Usage
> AlphaSim –mask [name.hdr] –fwhm 8 –pthr .001 …
–iter 10000 -quiet
Where:
mask = name of normalized anatomical mask (2nd level)
fwhm = Size of smoothing kernel (mm)
pthr = voxel-level threshold
Iter = number of simulations to run (usually 10k)
quiet = suppress output
61
Thresholding Summary
• Most thresholding is done at either voxel or cluster level
– Depends on level of inference that is of interest
– Spatial precision: voxel
– Regional inference
• Methods differ in control over false positives and sesntivity
– Bonferroni: strong control over false positives, Somewhat
conservative
– False-Discovery Rate: admits false positives, more sensitive
– Non-parametric: most adaptive, reputedly most accurate
• Next up: Whole brain search isn’t your only option
• Avoid the problem!
– Can use SCV’s and ROI’s to reduce/eliminate multiple
comparisons issues to improve power in areas with a prior
hypothesis
– But conclusions and inferences are different
62
Some notes on inference in
brain mapping studies
63
What Brain Mapping is Good For
• Making inferences on the presence of activity,
to either a) test a theory, or b) characterize
the pattern of brain responses to a task
• Limiting the false positive rate to a specified
level
• Leverage hypothesis testing to provide
evidence on a variety of theories: Is Area r
involved in Task x?
64
What Brain Mapping is Not Good For
• Estimating effect sizes (effect strength, or
predictive power)
• Testing the assumptions involved in the analysis
– ‘Neural’ timing and temporal profile of neural
response
– Link between neural activity and observed signal:
Hemodynamic response profile
– Appropriateness of additive linear model
– Normality and homogeneity of variance (needed for
valid p-values)
• Building a cumulative knowledge base
65
Kinds of Inferences
Hypothesis tests
Is there an
effect?
How well can
I predict…?
Where are
the effects?
Effect size estimation:
Cross-validation
Spatial statistics
Mixture models
Key: There are tradeoffs among these goals
With current analysis options, they cannot be maximized at once
66
Effect Size Estimation is Important for
Development of Applications
• Medicine
– Predicting treatment response, diagnosis,
‘personalized medicine’, neuro-rehabilitation and
prosthetics
– Sensitivity and specificity of tests: relate to effect size
• Law
– Lie detection, guilty knowledge, tort cases; evidence
on pain and cognitive deficits
• Psychology and neuroscience
– Testing for meaningfully large effects
• Marketing, military, homeland security
67
Spatial Pattern Estimation is Important for Theory
Building
• Need to know both which areas are ‘active’
and which are not
• Balance of false positives and false
negatives important for building a
cumulative science
• Often ignored because of bias towards
hypothesis testing and ‘strong inference’
68
True signal
Signal
Noise
Noise
Negative
Observed signal
Positive
Results
Hypothesis Test
Threshold
69
Courtesy of Tor Wager
The Problem with Estimating Effect Sizes
Conditioning on Significant Test Results
Observed signal
Results
Hypothesis test
Multiple comparisons
Signal
Noise
•
Conditioning on significance selects for high noise values (red/purple)
•
Equally true for all effect size measures: Pearson’s r, t-values, Z-scores, pvalues, etc.
70
Courtesy of Tor Wager
The ‘File Drawer Problem’
Sample of ‘significant’ voxels
False positive
Threshold
Obs. effect size (d)
True positive
Voxels
True effect size
Sample of published studies
Threshold
Average
Observed
Effect size
Conditioning on
publication causes bias
in effect size estimate
Study
1
2
3
4 True effect
size (blue)
71
Courtesy of Tor Wager
The ‘File Drawer Problem’: Meta-analysis of 5
Antidepressants
72
Melander et al., 2003 BMJ
Regression to the Mean
• Average height of a
Chinese male is 5’7”
• Yao Ming is 7’6” tall
• If Yao had a son, is he
likely to be shorter or
taller than Yao?
73
Tradeoff #1: Is there… vs. How big…
Hypothesis tests
(inference)
Is there an
effect?
How well can
I predict…?
Where are
the effects?
Effect size estimation
Spatial statistics
Mixture models
Stringent multiple comparisons correction is good for
inference, but bad for size estimation
74
The Problem with Estimating Effect Sizes Conditioning
on Significant Test Results
• More stringent multiple
comparisons: less
accurate estimation of
effect size
• Increased power:
reduced selection bias
– Larger effects
– Larger sample sizes
– Less noise
Yarkoni, 2009
75
Courtesy of Tor Wager
Problems with Estimating Effects Sizes
Observed signal
Results
N = 20, p < .001
If the true correlation looks like this…
r = .5
A typical ‘significant’ voxel looks like this…
Hypothesis test
Multiple comparisons
Signal
Noise
Negative
r = .78
Positive
Why? In looking at ‘significant’ tests, we are conditioning on having a high
observed effect size: r must be at least 0.67 in order to consider it!
76
So why would you ever show this?
r = .78
Descriptive reporting and plotting of results: Checking statistical assumptions
Brain
[Task – Control]
Results
Pathological
OK
OR
0
Event-related
Pathological
Brain
[Task – Control]
OK
AND
Behavior
Behavior
77
Non-Solutions
• Omitting display of scatterplots
• Using cross-validation for everything, even
if theory calls for a hypothesis test
• Using regions of interest only and ignoring
the information in much of the brain
78
Solutions: Effect Size Estimation
1. When performing a hypothesis test, interpret the
results literally:
“Given the model assumptions, this brain
area shows a non-zero effect.”
(…Not as an estimate of effect size)
2. Increase power!
2. To estimate effect size, use ‘hold out’ data, i.e.,
cross-validation
Unbiased estimates
of true effect size
2. Select a small number
of a prioi ROIs
Voxels
79
Avoiding the Problem (Preview of What is Up
Next)
• Small Volume Correction (SVC)
– If you know the region(s) you are interested in a
priori, you can limit examination to just those voxels
– Reduces number of voxels and thus reduces
multiple comparisons correction
• Region-of-interest (ROI) Analysis
– If you have a strong prior and reason to believe that
areas-of- interest are homogenous, can simply
average signal across an ROI
– Single comparison (per ROI)
• WFUPickAtlas toolbox for SPM provides a good
means to create anatomical masks for SVC’s or
ROI’s
• Downside: can only confirm prior hypotheses
– May miss new discoveries!
80
Region of Interest and Other
Analysis Methods
81
Signal
Negative
OR
0
Hypothesis test
Multiple comparisons
Positive
Contrast: Task comparison
Brain
[Task – Control]
Results
Noise
Results
Observed signal
Brain-behavior correlation
OR
Information-based mapping
Predictive
accuracy
Noise
Brain
[Task – Control]
True signal
Chance
Behavior
Individual subject
Effect size: d, Z, p
Effect size: r, p
Effect size: accuracy, p82
Whole Brain vs. ROI
Precision of prior spatial information
Some
None
Test each voxel
in whole brain
Test each voxel
In set of regions
Test each voxel
Single region
Lots
Test average in
single region
Multiple comparisons correction required
Stringent
need very strong
evidence
Some
need strong
evidence
None
need less
evidence
83
What is the Question?
Observed data
Does this brain area respond to my task?
Does this one?
Does this one?
Does this one?
84
What is the Answer?
Observed data
Does this one?
“Given the model assumptions,
this brain area shows a non-zero
effect.”
(Not necessarily informative
about how big the effect is.)
85
Up Next…
• Masking
– Limiting multiple comparisons
– Conjunction Analysis
– Disjunction
• ROI Analysis
– ROI selection
• Reverse Inference
• Degrees of Freedom
• Brain Mapping Considerations
86
Masking
87
Whole Brain vs. ROI
Precision of prior spatial information
Some
None
Test each voxel
in whole brain
Test each voxel
In set of regions
Test each voxel
Single region
Lots
Test average in
single region
Multiple comparisons correction required
Stringent
need very strong
evidence
Some
need strong
evidence
None
need less
evidence
88
Masking
• In 1st level analysis, all
voxels with a nonzero
value for every subject
are estimated
– gray matter, white
matter, csf, skull,
eyeballs
• Estimating a 1st level
model yields a mask
representing all voxels
included in the model
89
Masking
• Masking 1st level can eliminate unnecessary
tests from your analysis, enabling better
correlation for multiple comparisons
90
Masking
• Masking can also be used
to test more targeted
hypotheses
– E.g. are the areas activated
by in one contrast also
activated in another
(independent) contrast?
• Create a mask from the
first spmT*.img file using
ImCalc, then use the mask
analysis of second
91
Process Comparison
• One use of fMRI is to compare brain
processes
• Two main interests:
– Conjunction: tasks recruit common brain
areas and thus common processes
– Disjunction: tasks recruit distinct brain areas
and thus distinct processes
92
Conjunctions
• Often interested in processes
shared across tasks
– Task 1 recruits regions A,B,C,
and D
– Task 2 recruits regions A,D,E,
and F
• Task 1 and Task 2 share common
process(es) instantiated by
regions A and D
• Conjunction analysis aims to
demonstrate neural overlap
Select 2 or more contrasts
in SPM’s results menu
(ctrl+click to select
multiple) to perform
conjunctions
Different null hypotheses
can be tested.
Conjunction null is the
accepted standard.
93
Conjunction Analysis
•
SPM8 gives 2 flavors which differ in the null
hypothesis that they test
•
Global null: assesses whether the contrasts are
likely to be sampled from the null distribution
– Essentially a meta-analysis
– Contrasts may not be individually significant
•
Consider three contrasts with t-scores of 0.5, 1.1, 1.3
–
–
•
None of significant individually
But together, they reject the global null
Conjunction null: logical AND – all contrasts are
individually significant
– Typically what researchers are interested in
– Joint test can be conservative
•
•
Each contrast must be individually significant at
corrected threshold
Conjunction null is generally accepted approach
– SVC or ROI’s may be appropriate to overcome
conservativeness at whole-brain levels
Different null hypotheses
can be tested.
Nichols et al., 2005, NeuroImage
Conjunction null is the
accepted standard.
94
Conjunction: Practical Issues
• Contrast A activates a wide
network of regions
• Contrast B activates a
smaller network, which
differs from A, but some
voxels overlap
• Is this overlap meaningful?
– How much overlap would be
expected by chance?
95
Conjunction: Practical Issues
• Behavioral studies report low
correlations between Task A and
Task B
• Subjects can concurrently
perform Task A and Task B with
little interference and additive
factors analysis reveals no
interactions
• fMRI of the tasks reveal that they
both activate the dorsal ACC
• Researcher concludes that the
tasks do share processes and
touts the superiority of fMRI
• Conclusion justified?
Yarkoni et al., 2011, Nature Methods
20% of published studies find activation
in the dorsal ACC
Should this base-rate be taken into
account?
96
Conjunction: Summary
• Conjunction analysis is one means of process
comparison and should likely be done against the
conjunction null (rather than the global null)
• Conjunctions may occur simply because one
contrast is very encompassing
• Conjunctions may occur in areas that sub-serve
domain general processing across many tasks
• Important to be mindful of these matters when
interpreting conjunctions
97
Disjunctions
• How to characterize areas involved
in contrast A, but not contrast B?
• Requirements
– Active in contrast A
– A>B
– Not active in contrast B
• Bad practice
– Significant in A, not significant in B
– Could be active in B, just failed to
detect it!
– One sample demonstrated that
nearly half of neuroscience papers
made this error (Nieuwenhuis et al.,
2011, Nature Neuroscience)
When looking at results of
contrast A, use an exclusive
mask of contrast B to look for
voxels active in A, but not B.
Note, this is not sufficient inand-of itself to determine an
interaction! Be sure to directly
contrast A > B, as well.
98
Disjunctions: Practical Concerns
• Want: area X involved in
contrast A, but not contrast B
– Need:
1)
2)
3)
Contrast A  area X
A > B  area X
B  area X
• Issue #2: how to demonstrate
not active in contrast B?
– Difficult, but in the least should
show that area X is not
significant in contrast B at a
very liberal criterion
• E.g. p < 0.05, uncorrected
• Issue #1: how to define area
X?
– Voxels active in contrast A?
• No! This is biased to show 1 and 2
(more on this in a moment)
– Contrast orthogonal to A and B
• Independent data (e.g. functional
localizer)
• Anatomically defined
Different null hypotheses
can be tested.
Conjunction null is the
accepted standard.
99
Outline
• Masking
– Limiting multiple comparisons
– Conjunction Analysis
– Disjunction
• ROI Analysis
– ROI selection
• Reverse Inference
• Degrees of Freedom
• Brain Mapping Considerations
100
ROI Analysis
• Often want to make inferences on a particular
region-of-interest (ROI)
• Must be careful to define ROI in such a way
that does not make inferences dubious
• Inappropriate ROI definition has led to a great
deal of controversy in neuroscience
– Double-dipping: Kriegeskorte et al., 2009, Nat.
Neurosci.
101
ROIs Revsited
• ROIs based upon particular contrast are
biased to show a greater effect size than is
truly present.
– Cannot estimate effect size of A region
defined from contrast A
– Cannot examine A > B from a region defined
from contrast A
– Some researchers object to even visually
depicting effects from A
• E.g. fitted response will look too good
• NOTE: I DISAGREE WITH THIS STRONGLY
102
ROIs Revsited
• Inferences should be performed on
unbiased ROIs
– Orthogonal to contrast of interest
– Defined from independent data
• Separate functional localizer
• Cross-validation (separate data into sample test
sets)
• Separate study
– Expect regression to the mean!
– Defined anatomically
103
Methods for Selecting Unbiased ROIs
• Anatomical
• Functional
– Functional localizer
– Meta analysis
• Curated
• Neurosynth
104
Anatomical
WFUPickAtlas toolbox in SPM can define
ROIs using popular atlases such as AAL and
Talairach Daemon
105
Functional
• Localizer
106
Functional
• Meta analysis
Image from Bartra et al., 2012
107
Example: Neurosynth (ToM)
108
Outline
• Masking
– Limiting multiple comparisons
– Conjunction Analysis
– Disjunction
• ROI Analysis
– ROI selection
• Reverse Inference
• Degrees of Freedom
• Brain Mapping Considerations
109
Reverse inference:
When the heat is on, the house
gets hot
The house is hot… what can I
conclude?
110
Reverse Inference
1.
2.
3.
In the present study, when task comparison A was presented,
brain area Z was active
In other studies, when cognitive process X was putatively
engaged, then brain area Z was active.
Thus, the activity of area Z in the present study demonstrates
engagement of cognitive process X by task comparison A
• Example:
Poldrack, 2006, TICS
– Stroop task activates the dorsal ACC
– In several studies, pain activated the dorsal ACC
– -> The Stroop task hurts
• Reverse Inference:
– Reason backwards from activation in a region
engagement of a cognitive function
to
111
Yarkoni et al., 2011, Nature Methods
Reverse Inference: Problems
• Logical fallacy: affirming the consequent
– If one takes the fMRI course, they will know
fMRI.
– Neo knows fMRI.
– Neo took the fMRI course.
• Brain regions are engaged by diverse
demands
– Even the presumably selective, fusiform “face”
area is activated in response to diverse stimuli
112
Reverse Inference: What to Do?
• On some level, reverse inference is
necessary
– Trying to build a collective knowledge of the brain
– Need to link results with prior data
• Keep selectively in mind when making
inferences
– Brain area X is engaged in your context of
interest. What about other contexts?
– Check out neurosynth.org
• For a given region, can output words associated with
that region in the published literature
113
Analysis Choices & Degrees
of Freedom
114
The Degrees of Freedom Problem
• Many aspects of fMRI analysis have multiple solutions and options
–
–
–
–
–
–
Order of pre-processing steps
Size of smoothing kernel
Spatial normalization template
High-pass filter length
Basis set
Motion regression
• Choices can strongly affect results
Mean activation and
variation as a result of
analysis choices
Carp, 2012, Frontiers
in Neuroscience
115
Degrees of Freedom and Bias
•
Problematic scenario
– “This doesn’t look how I expected it to look. I wonder if I did something wrong?”
• Re-analyze until it looks right
•
fMRI analysis is complex and it is likely that some degree of optimization is necessary
post-data collection
– E.g. Planned basis function does not appropriately fit data
– Must be very careful not to bias results
•
Ninja Derek’s recommendations
– Embed contrasts with known solutions that are orthogonal to contrasts of interest
• E.g. Right vs Left Motor response, Error vs Correct response
– Use these contrasts as criterion for optimization
– Design experiments to adjudicate between multiple equally interesting
hypotheses
• Theory A predicts X, Theory B predicts Y, Theory C predicts Z
– Reduces bias towards any one result
116
Conclusions
• Most studies have been based on null-hypothesis tests
• These are useful for exploratory purposes, and for
constraining theories about the physical basis of mind
• In the brain imaging setting, there has been little attention
paid to estimating effect sizes, and the standard framework
produced biased post-hoc estimates
• Effect sizes may be increasingly important in the future, as
applications are developed
• There are tradeoffs in analysis choices, and the best option
depends on your goal and what kinds of effects (local vs.
distributed) you expect
117
Questions?
…for you and for me!
118

similar documents