### “A chi-square test showed that…” – or did it really?

```«A chi-square test showed that...»
– or did it really?
Bård Uri Jensen
http://privat.hihm.no/buj/
Allowing [statistical software] to do our thinking
is a sure recipe for disaster.
(Good & Hardin, 2012, p. xi)
«Simple» statistical tests
• chi-square (X 2) test
• t-test
Statistical hypothesis testing
1. Formulate a hypothesis

E.g. In Norwegian L2, Vietnamese have more TENSE errors than Somali.
2. Formulate a null-hypothesis

Vietnamese and Somalis have the same rate of TENSE errors.
3. «Disprove» the null-hypothesis = demonstrate its unlikelihood


E.g. less than 5% chance for the null-hypothesis to be true
= «Significance»
• We choose α according to what we consider an acceptable risk of
false conclusions
 Often 5% in linguistic research
Conditions of use
• Independent observations
 chi-square test
 t-test
• Parametric assumptions
 t-test
• The dangers of repeated testing
 any test
A simple example from ornithology
A simple example from ornithology
A simple example from ornithology
A simple example from ornithology
A simple example from corpus linguistics
A simple example from corpus linguistics
• The observations should be independent.
• An important condition of use for
 chi-squared test
 t-test
 The observations should be of different individuals.
«Chi-square is a much-abused test in second language
research studies, and often one of its assumptions (that of
independence of data) is violated as a matter of course.»
Larson-Hall (2010, p.206)
Example 1:
Chi-squared test, non-independent observations
• Blom & Paradis 2013
 Journal of Speech, Language, and Hearing Research
 On past tense production in L2 children with language impairment
• 48 children with English as L2
• Overregularization of past tense
 Hypothesis: Less common in verb stems ending in /d/ or /t/
overregularization
zero marking
d# or t#
16
69
others
42
98
• X 2 (1) = 3.45, p (one-sided) = 0.032
• Problem: n = 85 + 140, N = 48
• Observations are not independent, so the result is invalid.
Example 1:
Chi-squared test, non-independent observations
• Solution A:
 Pick just one observation from each author/speaker
• “To exclude the author as one more relevant factor, the database
was cleaned so that there is only one example for each verb
from any single author.”
Sokolova 2012, p. 94
Example 1:
Chi-squared test, non-independent observations
• Solution A:
 Pick just one observation from each author/speaker
 Sokolova 2012
• Solution B:




•
Calculate average values for each informant
Use the average values as independent observations
Test significance with an appropriate test, e.g. t-test or U-test
Gujord 2013
Both these solutions might require a larger corpus!
• «Solution» C:
 Alter the research question
 Danckaert 2011
Example 1:
Chi-squared test, non-independent observations
• Solution B:
Example 2:
T-test, non-independent observations
• Klavan 2012
 PhD thesis from Tartu University
 Investigation of adposition ‘peal’ and adessive case
• 450 observations of each, from 2 corpora
•
•
•
•
t = 8.02, p < 0.001
Conclusion: adessive phrases are longer than ‘peal’-phrases
Problem: Observations are not independent.
The conclusion is invalid.
Example 3:
T-test, non-normal populations
• Hunter (2011, s. 48)
 PhD thesis from Birmingham University
 On grammaticality judgements by L2 students
• Conclusion:
• the accuracy (max. = 1) for the teacher group (M = .98, SD = .14)
was significantly higher than the student group (M = .64, SD = .49),
t(1) = 4.9, p < .001.
• Problem:
 Mean = 0.98, Maximum value = 1
 Standard deviation= 0.14
• The distribution cannot possibly be normal.
• The result is invalid.
2,5
2,0
1,5
1,0
0,5
0,0
0,0
0,5
1,0
1,5
Example 4
Repeated testing
• Leedham 2011
 PhD thesis, The Open University
 Features in the writing of Chinese students in UK universities
• Conclusion:
• There are differences in frequencies of certain phrases
between 3rd year students and younger students
• Problem:
• Repeated testing without adjusting the probability values
• Some of the results are not valid.
CV
CV
Moral
There are no simple tests.
1. You should understand the conditions of the test.
2. You should take the conditions into account.
3. You should document properly



how you perform the test,
what numbers you put into it,
how the conditions are met.
«A chi-square test showed that the difference is significant.»
Is it really that important?
• «[C]ompared to other social sciences (e.g., psychology,
communication, sociology, anthropology, …) or branches of
linguistics (e.g., psycholinguistics, phonetics, sociolinguistics…),
most of corpus linguistics has paradoxically only begun to
develop this methodological awareness.»
Gries (forthcoming, p.1)
Is it really that important?
• «It has become increasingly apparent over a period of several
years that psychologists, taken in the aggregate, employ the
chi-square test incorrectly.»
Lewis and Burke (1949)
Whose responsibility is it?
«Corpus linguistics needs to ‘catch up’ [...]»
Gries (forthcoming, p.1)
```