Report

Overlooking Stimulus Variance Jake Westfall University of Colorado Boulder Charles M. Judd University of Colorado Boulder David A. Kenny University of Connecticut Cornfield & Tukey (1956): “The two spans of the bridge of inference” My actual samples 50 University of Colorado undergraduates; 40 positively/negatively valenced English adjectives Ultimate targets of generalization My actual samples All healthy, Western adults; All non-neutral visual stimuli 50 University of Colorado undergraduates; 40 positively/negatively valenced English adjectives Ultimate targets of generalization All potentially sampled participants/stimuli My actual samples All healthy, Western adults; All non-neutral visual stimuli All CU undergraduates taking Psych 101 in Spring 2014; All short, common, strongly valenced English adjectives 50 University of Colorado undergraduates; 40 positively/negatively valenced English adjectives Ultimate targets of generalization All potentially sampled participants/stimuli My actual samples All healthy, Western adults; All non-neutral visual stimuli “Subject-matter span” “Statistical span” 50 University of Colorado undergraduates; 40 positively/negatively valenced English adjectives Difficulties crossing the statistical span • Failure to account for uncertainty associated with stimulus sampling (i.e., treating stimuli as fixed rather than random) leads to biased, overconfident estimates of effects • The pervasive failure to model stimulus as a random factor is probably responsible for many failures to replicate when future studies use different stimulus samples Doing the correct analysis is easy! • Modern statistical procedures solve the statistical problem of stimulus sampling • These linear mixed models with crossed random effects are easy to apply and are already widely available in major statistical packages – R, SAS, SPSS, Stata, etc. Illustrative Design • Participants crossed with Stimuli – Each Participant responds to each Stimulus • Stimuli nested under Condition – Each Stimulus always in either Condition A or Condition B • Participants crossed with Condition – Participants make responses under both Conditions Sample of hypothetical dataset: 5 4 6 7 3 8 8 7 9 5 6 5 4 4 7 8 4 6 9 6 7 4 5 6 5 3 6 7 4 5 7 5 8 3 4 5 Typical repeated measures analyses (RM-ANOVA) 5 4 6 7 3 8 8 7 9 5 6 5 4 4 7 8 4 6 9 6 7 4 5 6 5 3 6 7 4 5 7 5 8 3 4 5 How variable are the stimulus ratings around each of the participant means? The variance is lost due to the aggregation “By-participant analysis” MBlack MWhite Difference 5.5 6.67 1.17 5.5 6.17 0.67 5.0 5.33 0.33 Typical repeated measures analyses (RM-ANOVA) 5 4 6 7 3 8 8 7 9 5 6 5 4 4 7 8 4 6 9 6 7 4 5 6 5 3 6 7 4 5 7 5 8 3 4 5 4.00 3.67 6.33 7.33 3.67 6.33 8.00 6.00 8.00 4.00 5.00 5.33 Sample 1 v.s. Sample 2 “By-stimulus analysis” Simulation of type 1 error rates for typical RM-ANOVA analyses • Design is the same as previously discussed • Draw random samples of participants and stimuli – Variance components = 4, Error variance = 16 • Number of participants = 10, 30, 50, 70, 90 • Number of stimuli = 10, 30, 50, 70, 90 • Conducted both by-participant and by-stimulus analysis on each simulated dataset • True Condition effect = 0 Type 1 error rate simulation results • The exact simulated error rates depend on the variance components, which although realistic, were ultimately arbitrary • The main points to take away here are: 1. The standard analyses will virtually always show some degree of positive bias 2. In some (entirely realistic) cases, this bias can be extreme 3. The degree of bias depends in a predictable way on the design of the experiment (e.g., the sample sizes) The old solution: Quasi-F statistics • Although quasi-Fs successfully address the statistical problem, they suffer from a variety of limitations – Require complete orthogonal design (balanced factors) – No missing data – No continuous covariates – A different quasi-F must be derived (often laboriously) for each new experimental design – Not widely implemented in major statistical packages The new solution: Mixed models • Known variously as: – Mixed-effects models, multilevel models, random effects models, hierarchical linear models, etc. • Most psychologists familiar with mixed models for hierarchical random factors – E.g., students nested in classrooms • Less well known is that mixed models can also easily accommodate designs with crossed random factors – E.g., participants crossed with stimuli Grand mean = 100 MeanA = -5 MeanB = 5 Participant Means 5.86 7.09 -1.09 -4.53 Stimulus Means: -2.84 -9.19 -1.16 18.17 Participant Slopes 3.02 -9.09 3.15 -1.38 Everything else = residual error The linear mixed-effects model with crossed random effects Fixed effects Random effects Fitting mixed models is easy: Sample syntax R SAS SPSS library(lme4) model <- lmer(y ~ c + (1 | j) + (c | i)) proc mixed covtest; class i j; model y=c/solution; random intercept c/sub=i type=un; random intercept/sub=j; run; MIXED y WITH c /FIXED=c /PRINT=SOLUTION TESTCOV /RANDOM=INTERCEPT c | SUBJECT(i) COVTYPE(UN) /RANDOM=INTERCEPT | SUBJECT(j). Mixed models successfully maintain the nominal type 1 error rate (α = .05) Conclusion • Stimulus variation is a generalizability issue • The conclusions we draw in the Discussion sections of our papers ought to be in line with the assumptions of the statistical methods we use • Mixed models with crossed random effects allow us to generalize across both participants and stimuli The end Further reading: Judd, C. M., Westfall, J., & Kenny, D. A. (2012). Treating stimuli as a random factor in social psychology: A new and comprehensive solution to a pervasive but largely ignored problem. Journal of personality and social psychology, 103(1), 54-69.