GZLM - University of Essex

... including GEE
• Generalized Linear Modelling
• A family of significance tests
• ... Something we don’t see mentioned much
in articles yet... but will hear more of
• Maybe we should be using it!
• Often we have RQs and RHs that require us to
compare groups and/or conditions. E.g.
– Do student attitudes vary depending on year of study
and on which of two types of speaking instruction
they received?
– Does word length and word frequency affect how well
learners remember the word?
– Do speakers differ in their pronunciation of a sound
depending on gender and formality of situation?
– Do students trained to use online dictionaries improve
in writing more than those who are not?
• Well known statistical significance tests for
these comparisons are the GLM family
• General Linear Model
• GLM includes:
– t tests
– Pearson correlation
– Linear regression
• But GLM is picky... comes with prerequisite
• 1. the DV scale
Can’t deal with data that is not scores...
...that are ‘equal interval’ and
...on a supposedly open ended scale
Counts have to be treated as scores
Rating scales possibly, but... are they ‘equal interval’
Not binary data such as pass/fail or yes/no responses
• 2. Further features of the score data (in the
population) e.g.
– Normality of distribution shape of scores
– Similarity of spread of scores in different groups
(aka homoscedasticity or homogeneity of
– Similarity of variance of differences between pairs
of repeated measures (aka sphericity)
• Previous ways of dealing with data that fails
the prerequisites
– Use GLM anyway, claiming it is ‘robust’ even when
prerequisites are missing
• Or just use GLM and don’t check/mention the problems
– For normality, transform the data to be more
‘normal’ in shape (Example)
• But results are then hard to talk about
– Use an alternative test (nonparametric, weaker)
• But such tests are only available for simple comparisons
• Since the 80s, but only recently available in
popular packages like SPSS...
• GZLM, including GEE, covers most of the
ground of the GLM family, and more, and
deals with most of the problems
• Generalized Linear Model itself for comparing
groups only (GZLM)
• An extension of GZLM called Generalized
Estimating Equations (GEE) for comparing
repeated measures (and groups if necessary)
• An example comparing groups
• We see the issue of choosing the right analysis
for the distribution shape
• Marin’s data, DV6
– DV: Six point rating scale response for how often
learners use vocab strategies
– EV: Two genders
– EV: Five years of study in university
• An example with repeated measures
• We see how to turn the data into ‘long’ form
which GEE requires
• Issariya’s data
– DV: Percent correct scores for learning vocab
– EV: Pretest versus posttest
– EV: Experimental group (with vocab learning
strategy instruction) and control group (with extra
• An example with Poisson distribution
• We see computational limitations
• Nushoor’s data
– DV: Counts of how often people used types of
modifier expression with requests
– EV: Types of modifier
– EV: Four groups (2 NS, 2 NNS)
– EVs: Types of request situation in terms of social
variables such as power, and seriousness
• An example with binary data
• Vineeta’s r data
DV: Numbers of r produced versus other variants
EVs: Various features of the word
EVs: Various features of the people
EV: Formality of situation
• This analysis is I think more or less equivalent to
what traditional Varbrul analysis does....BUT
– The output is in a different form
– In fact this sort of analysis is not really statistically
acceptable anyway (see
• To analyse data like Vineeta’s properly we
need either the latest version of Varbrul called
Rbrul, or GZLM Mixed... the latest bit of GZLM
added to SPSS
• Watch this space....

similar documents