Overlooking stimulus variance

Report
Overlooking Stimulus Variance
Jake Westfall
University of Colorado Boulder
Charles M. Judd
University of Colorado Boulder
David A. Kenny
University of Connecticut
Cornfield & Tukey (1956):
“The two spans of the bridge of inference”
My actual
samples
50 University of Colorado
undergraduates;
40 positively/negatively
valenced English adjectives
Ultimate targets
of generalization
My actual
samples
All healthy, Western adults;
All non-neutral visual stimuli
50 University of Colorado
undergraduates;
40 positively/negatively
valenced English adjectives
Ultimate targets
of generalization
All
potentially sampled
participants/stimuli
My actual
samples
All healthy, Western adults;
All non-neutral visual stimuli
All CU undergraduates taking
Psych 101 in Spring 2014;
All short, common, strongly
valenced English adjectives
50 University of Colorado
undergraduates;
40 positively/negatively
valenced English adjectives
Ultimate targets
of generalization
All
potentially sampled
participants/stimuli
My actual
samples
All healthy, Western adults;
All non-neutral visual stimuli
“Subject-matter span”
“Statistical span”
50 University of Colorado
undergraduates;
40 positively/negatively
valenced English adjectives
Difficulties crossing the statistical span
• Failure to account for uncertainty associated with
stimulus sampling (i.e., treating stimuli as fixed
rather than random) leads to biased, overconfident
estimates of effects
• The pervasive failure to model stimulus as a random
factor is probably responsible for many failures to
replicate when future studies use different stimulus
samples
Doing the correct analysis is easy!
• Modern statistical procedures solve the statistical
problem of stimulus sampling
• These linear mixed models with crossed random
effects are easy to apply and are already widely
available in major statistical packages
– R, SAS, SPSS, Stata, etc.
Illustrative Design
• Participants crossed with Stimuli
– Each Participant responds to each Stimulus
• Stimuli nested under Condition
– Each Stimulus always in either Condition A or Condition B
• Participants crossed with Condition
– Participants make responses under both Conditions
Sample of hypothetical dataset:
5
4
6
7
3
8
8
7
9
5
6
5
4
4
7
8
4
6
9
6
7
4
5
6
5
3
6
7
4
5
7
5
8
3
4
5
Typical repeated measures analyses (RM-ANOVA)
5
4
6
7
3
8
8
7
9
5
6
5
4
4
7
8
4
6
9
6
7
4
5
6
5
3
6
7
4
5
7
5
8
3
4
5
How variable are the stimulus ratings
around each of the participant means?
The variance is lost due to the aggregation
“By-participant analysis”
MBlack
MWhite
Difference
5.5
6.67
1.17
5.5
6.17
0.67
5.0
5.33
0.33
Typical repeated measures analyses (RM-ANOVA)
5
4
6
7
3
8
8
7
9
5
6
5
4
4
7
8
4
6
9
6
7
4
5
6
5
3
6
7
4
5
7
5
8
3
4
5
4.00 3.67 6.33 7.33 3.67 6.33 8.00 6.00 8.00 4.00 5.00 5.33
Sample 1
v.s.
Sample 2
“By-stimulus analysis”
Simulation of type 1 error rates for
typical RM-ANOVA analyses
• Design is the same as previously discussed
• Draw random samples of participants and stimuli
– Variance components = 4, Error variance = 16
• Number of participants = 10, 30, 50, 70, 90
• Number of stimuli = 10, 30, 50, 70, 90
• Conducted both by-participant and by-stimulus
analysis on each simulated dataset
• True Condition effect = 0
Type 1 error rate simulation results
• The exact simulated error rates depend on the
variance components, which although realistic,
were ultimately arbitrary
• The main points to take away here are:
1. The standard analyses will virtually always show
some degree of positive bias
2. In some (entirely realistic) cases, this bias can be
extreme
3. The degree of bias depends in a predictable way on
the design of the experiment (e.g., the sample sizes)
The old solution: Quasi-F statistics
• Although quasi-Fs successfully address the
statistical problem, they suffer from a variety of
limitations
– Require complete orthogonal design (balanced factors)
– No missing data
– No continuous covariates
– A different quasi-F must be derived (often laboriously)
for each new experimental design
– Not widely implemented in major statistical packages
The new solution: Mixed models
• Known variously as:
– Mixed-effects models, multilevel models, random
effects models, hierarchical linear models, etc.
• Most psychologists familiar with mixed models
for hierarchical random factors
– E.g., students nested in classrooms
• Less well known is that mixed models can also
easily accommodate designs with crossed
random factors
– E.g., participants crossed with stimuli
Grand mean = 100
MeanA = -5
MeanB = 5
Participant
Means
5.86
7.09
-1.09
-4.53
Stimulus Means: -2.84
-9.19
-1.16
18.17
Participant
Slopes
3.02
-9.09
3.15
-1.38
Everything else = residual error
The linear mixed-effects model
with crossed random effects
Fixed effects
Random effects
Fitting mixed models is easy: Sample syntax
R
SAS
SPSS
library(lme4)
model <- lmer(y ~ c + (1 | j) + (c | i))
proc mixed covtest;
class i j;
model y=c/solution;
random intercept c/sub=i type=un;
random intercept/sub=j;
run;
MIXED y WITH c
/FIXED=c
/PRINT=SOLUTION TESTCOV
/RANDOM=INTERCEPT c | SUBJECT(i) COVTYPE(UN)
/RANDOM=INTERCEPT | SUBJECT(j).
Mixed models successfully maintain
the nominal type 1 error rate (α = .05)
Conclusion
• Stimulus variation is a generalizability issue
• The conclusions we draw in the Discussion sections
of our papers ought to be in line with the
assumptions of the statistical methods we use
• Mixed models with crossed random effects allow us
to generalize across both participants and stimuli
The end
Further reading:
Judd, C. M., Westfall, J., & Kenny, D. A. (2012).
Treating stimuli as a random factor in social
psychology: A new and comprehensive solution to a
pervasive but largely ignored problem. Journal of
personality and social psychology, 103(1), 54-69.

similar documents