Report

GZLM ... including GEE • Generalized Linear Modelling • A family of significance tests • ... Something we don’t see mentioned much in articles yet... but will hear more of • Maybe we should be using it! • Often we have RQs and RHs that require us to compare groups and/or conditions. E.g. – Do student attitudes vary depending on year of study and on which of two types of speaking instruction they received? – Does word length and word frequency affect how well learners remember the word? – Do speakers differ in their pronunciation of a sound depending on gender and formality of situation? – Do students trained to use online dictionaries improve in writing more than those who are not? • Well known statistical significance tests for these comparisons are the GLM family • General Linear Model • GLM includes: – t tests – ANOVA – Pearson correlation – Linear regression • But GLM is picky... comes with prerequisite requirements • 1. the DV scale – – – – – – Can’t deal with data that is not scores... ...that are ‘equal interval’ and ...on a supposedly open ended scale Counts have to be treated as scores Rating scales possibly, but... are they ‘equal interval’ Not binary data such as pass/fail or yes/no responses • 2. Further features of the score data (in the population) e.g. – Normality of distribution shape of scores – Similarity of spread of scores in different groups (aka homoscedasticity or homogeneity of variance) – Similarity of variance of differences between pairs of repeated measures (aka sphericity) • Previous ways of dealing with data that fails the prerequisites – Use GLM anyway, claiming it is ‘robust’ even when prerequisites are missing • Or just use GLM and don’t check/mention the problems – For normality, transform the data to be more ‘normal’ in shape (Example) • But results are then hard to talk about – Use an alternative test (nonparametric, weaker) • But such tests are only available for simple comparisons • Since the 80s, but only recently available in popular packages like SPSS... • GZLM, including GEE, covers most of the ground of the GLM family, and more, and deals with most of the problems • Generalized Linear Model itself for comparing groups only (GZLM) • An extension of GZLM called Generalized Estimating Equations (GEE) for comparing repeated measures (and groups if necessary) • An example comparing groups • We see the issue of choosing the right analysis for the distribution shape • Marin’s data, DV6 – DV: Six point rating scale response for how often learners use vocab strategies – EV: Two genders – EV: Five years of study in university • An example with repeated measures • We see how to turn the data into ‘long’ form which GEE requires • Issariya’s data – DV: Percent correct scores for learning vocab wordlists – EV: Pretest versus posttest – EV: Experimental group (with vocab learning strategy instruction) and control group (with extra practice) • An example with Poisson distribution • We see computational limitations • Nushoor’s data – DV: Counts of how often people used types of modifier expression with requests – EV: Types of modifier – EV: Four groups (2 NS, 2 NNS) – EVs: Types of request situation in terms of social variables such as power, and seriousness • An example with binary data • Vineeta’s r data – – – – DV: Numbers of r produced versus other variants EVs: Various features of the word EVs: Various features of the people EV: Formality of situation • This analysis is I think more or less equivalent to what traditional Varbrul analysis does....BUT – The output is in a different form – In fact this sort of analysis is not really statistically acceptable anyway (see http://www.scottishhistorysociety.org/media/media _200043_en.pdf) • To analyse data like Vineeta’s properly we need either the latest version of Varbrul called Rbrul, or GZLM Mixed... the latest bit of GZLM added to SPSS • Watch this space....