```Statistical power in experiments in
which samples of participants
respond to samples of stimuli
Jake Westfall
David A. Kenny
University of Connecticut
Charles M. Judd
• Studies involving participants responding
to stimuli (hypothetical data matrix):
Subject #
1
2
3
.
.
.
4
6
7
3
8
8
7
9
5
6
4
7
8
4
6
9
6
7
4
5
3
6
7
4
5
7
5
8
3
4
• Just in domain of implicit prejudice and
stereotyping:
–
–
–
–
–
–
–
IAT (Greenwald et al.)
Affective Priming (Fazio et al.)
Affect Misattribution Procedure (Payne et al.)
Primed Lexical Decision task (Wittenbrink et al.)
Hard questions
• “How many stimuli should I use?”
• “How similar or variable should the stimuli
be?”
• “When should I counterbalance the
assignment of stimuli to conditions?”
• “Is it better to have all participants respond
to the same set of stimuli, or should each
• “Should participants make multiple responses
to each stimulus, or should every response by
a participant be to a unique stimulus?”
Power analysis in crossed designs
• Power determined by several parameters:
– 1 effect size (Cohen’s d)
– 2 sample sizes
• p = # of participants
• q = # of stimuli
– Set of Variance Partitioning Coefficients (VPCs)
• VPCs describe what proportion of the random
variation in the data comes from which sources
• Different designs depend on different VPCs
Four common experimental designs
For power = 0.80,
need q ≈ 50
For power = 0.80,
need p ≈ 20
?
Maximum attainable power
• In crossed designs, power asymptotes
at a maximum theoretically attainable
value that depends on:
– Effect size
– Number of stimuli
– Stimulus variability
• Under realistic assumptions, maximum
attainable power can be quite low!
To obtain max.
power = 0.9…
Pessimist:
q=86
Realist:
q=
20 to 50
Optimist:
q=11
Implications of maximum
attainable power
stimuli before you begin collecting data!
– Once data collection begins, maximum
attainable power is pretty much determined.
• Even the most optimistic assumptions
imply that we should use at least 11
stimuli per between-stimulus condition
– Based on achieving max. power = 0.9 to
detect a medium effect size (d = 0.5)
stimulus presentation?
• Assume that responses to each stimulus
take about 10 minutes (e.g., film clips).
• Power analysis says we need q=60 to
reach power=0.8 (based on having p=60)
• But then it would take over 10 hours for a
participant to respond to every stimulus!
• The highest feasible number of responses
per participant is, say, 6 (about one hour)
• Are we doomed to have low power? No!
Stimuli-within-Block designs
Standard error reduced
by factor of 2.3!
URL for power app:
JakeWestfall.org/power/
Article reference:
Westfall, J., Kenny, D. A., & Judd, C. M. (in press).