### 007_model_selection

```Mosteller & Tukey (1977).
Data Analysis and
Regression.
“Encouraging linguists
to use linear mixedeffects models is like
giving shotguns to
toddlers.”
(see Barr et al., 2013)
Gerry Altmann
“A world of subjectivity”
Sarah Depaoli
“IF YOU BEAT THE
DATA, AT SOME TIME
IT WILL SPEAK”
“A world of subjectivity”
Sarah Depaoli
“… and then you
publish and get
tenure.”
LMM
response ~ intercept + slope * fixed effect
+
error
distinguish between test
and control variables
Test vs. Control Variable Example
test variable
Null Model
control
variable
Test vs. Control Variable Example
test variable
Null Model
control
variable
Test vs. Control Variable Example
Response ~
Critical Effect
BLACK
BOX
Control 1
Control 2
Random
Effects
Test vs. Control Variable Example
Response ~
Critical Effect
BLACK
BOX
Control 3
Control 2
Random
Effects
Test vs. Control Variable Example
Response ~
Critical Effect
Control 3
Control 2
Random
Effects
Model
Simplicity
Model
Fit
“Exploratory
End”
“Confirmatory
End”
Theorydriven
Harald Baayen
“Exploratory
End”
“Confirmatory
End”
Theorydriven
Roger Mundry
(and many others)
Big Question:
How much do you allow the
data to suggest new
hypotheses? How much do you
depend on a priori theory?
Approach 1:
more data-driven
• e.g., test whether random
slopes are needed (maybe
• e.g., test whether
interaction for sth. is
necessary or not (“o.k.” if
the interaction is a control
variable)
• e.g., test whether sth.
requires a non-linear or a
linear effect (maybe o.k.)
Approach 2:
more theorydriven
Approach 1:
more data-driven
• Taken to the extreme, this approach
has a very high likelihood of finding
any significant result
• The model selection process is less
transparent to outsiders (or, you
have to write a LONG LONG stats
section)
Approach 2:
more theorydriven
Approach 1:
more data-driven
• You don’t miss important
• Your model might thus be
more accurate and “more
true to the data”
Approach 2:
more theorydriven
Approach 1:
more data-driven
Approach 2:
more theorydriven
• You formulate your model before you look at
the data
• The components of your model are guided by:
 Theory + Published Results
 General world-knowledge
 Research experience
• Taken to the extreme, you can’t even make a
plot before you formulate your model
Approach 1:
more data-driven
Approach 2:
more theorydriven
• It forces you to think a lot
• It’s fun!
• It gives you a lot of responsibility, as a scientist
• Your estimates are going to be more
conservative
Approach 1:
more data-driven
Approach 2:
more theorydriven
(before you conduct
Test whether control
variables interact with
test variable, or whether
Build model, evaluate the
they are needed
model’s assumptions
Build model that better
fits the assumptions
People might
speed up or slow
down
throughout an
experiment.
You need to
know that each
item was
repeated two
times!
You need to
know that there’s
multiple
responses per
subject and item!
Token
Researcher ;-)
Keep in mind:
• You have to resolve non-independencies
• Your random effects structure should be
maximal with respect to your
experimental design
from yourself:
Whatever you do,
not be based on the
(JEPS Bulletin)
Important principle
CONFIRM FIRST
EXPLORE SECOND
John McArdle
McArdle (2011: 335)
McArdle, J. J. (2011). Some ethical issues in factor analysis. In A.T. Panter & S. K. Sterba (Eds.),
Handbook of Ethics in Quantitative Methodology (pp. 313-339). New York, NY: Routledge.
The write-up
Important principle
BE HONEST
NOT PURE
John McArdle
Cool guidelines
United Nations Economic Commission for
Europe (2009a). Making Data Meaningful Part 1:
A guide to writing stories about numbers. New
York and Geneva: United Nations.
United Nations Economic Commission for
Europe (2009b). Making Data Meaningful Part 2:
A guide to presenting statistics. New York and
Geneva: United Nations.
“We tested a linear
mixed effects model
with subjects and
items as random
effects.”
The write-up should reflect (as
selection procedure
= Reproducible Research
Rule of thumb:
“One needs to provide sufficient
information for the reader to be
able to recreate the analyses.”
Barr et al. (2013)
information that I provided,
could I, myself, replicate the
analysis?
How to write up
• (1) "Phenomenon-oriented write-up"
• (2) Appendix / Supplementary Materials
Example #1
“We used generalized linear mixed models to test the effect of
Gender and Politeness on pitch. Subjects and items were
random effects (random intercepts) (Baayen, Davidson & Bates,
2008), with random slopes for subjects and items for the effect
Politeness (Barr, Levy, Scheepers & Tily, 2013). We also included
a Gender * Politeness interaction into the model and if this
interaction was not significant, only included the main effects.
/// Q-Q plots and plots of residuals against fitted values
revealed no obvious deviations from normality and
homoskedasticity. We report p-values based on Likelihood Ratio
Tests of the model with the main fixed effect in question
(Politeness) against the model without the main fixed effect
(null model, including Gender).”
Example #2: "Phenomenon-oriented"
“We used generalized linear mixed models to test the
association between voice onset time and pitch. The fixed
effects quantify the effect of VOT on politeness, as well as the
effect of place of articulation, vowel type and gender on
politeness. The random effects quantify the by-subject and byitem variability in pitch (random intercepts), as well as the
variation of the effect of VOT on pitch for subjects and items
(random slopes).”
Mentioning assumptions
“Visual inspection of residual plots revealed no obvious
deviation from normality and homoskedasticity of errors.”
“We checked plots of residuals against fitted values and found
no indication that the normality and homoskedasticity
assumption were violated.”
“… indicated a problem with … We therefore log-transformed
the data.”
Results
o Provide results of likelihood ratio test (i.e.,
significance etc.)
o Provide estimates and standard errors in the
metric of the model
o For poisson and logistic regression,
additionally provide some exemplary backtransformed values (don’t back-transform the
standard errors)
Likelihood Model Output
Data: mag
Models:
magmodel.maineffect: linelength ~ condition + city_status +
german_side + gender +
magmodel.maineffect:
trial_order + (1 + condition * city_status |
subjects) +
magmodel.maineffect:
(1 + condition * city_status | items)
magmodel: linelength ~ condition * city_status + german_side + gender
+
magmodel:
trial_order + (1 + condition * city_status | subjects) +
magmodel:
(1 + condition * city_status | items)
Df
AIC
BIC logLik Chisq Chi Df Pr(>Chisq)
magmodel.maineffect 27 7984.5 8121.9 -3965.3
magmodel
28 7893.7 8036.2 -3918.8 92.821
1 < 2.2e-16
***
--Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Important principle
BE HONEST
NOT PURE
John McArdle