### C26 Chi Square AP Statistics Review

```*
Inference for Distributions of Categorical
Variables (C26 BVD)
* Chi-Squared distributions are appropriate to
model sampling distributions for counted data
(categorical variable(s)).
* If you have only 1 count being compared to 1
expected, you can use a 1 proportion z-interval
or test if conditions are met
* Conditions to check are: random
sampling/assignment, counted data (not
percents, etc.), expected cell counts all >5
(not observed, expected!)
* If expected cell count is violated, may be able
to “fix” by collapsing table
*
*If you have one set of counts for one
variable being compared to an expected
distribution, that is Goodness of Fit (OneWay Table)
*If you have multiple distributions for a
single variable (Think: 1 question asked in
survey) = Test of Homogeneity
*If you have two variables (Think: 2 survey
questions) = Test of Independence
*
* (observed – expected)2 / expected for each cell
* Sum all those and you have your statistic
* Degrees of freedom for GOF = categories – 1
* Degrees of freedom for Homogeneity and
Independence = (rows – 1) * (columns – 1) does not include table margins
*
*Old calculators:
Put observed in L1,
expected in L2 , (o-e)2/ e in L3, then sum
L3. That is Chi-squared. Then used
Xcdf(statistic, big number, df) to find p-value
*Newer calculators: Put observed in L1,
expected in L2, then run GOF test under
*Always report your test statistic, df, and pvalue, then make conclusion
*
*Put table in matrix A.
Do not include
margins.
*Run chi-squared test under Stat-Test
*Don’t forget to check Matrix B for
expected count violations.
*Always report Chi-squared statistic, df,
and p-value then make conclusion
*
* GOF:
Ho: The distribution for _ is as expected
(may need to be more specific). Ha: The
distribution for _ is NOT as expected.
* Homogeneity: Ho: There is no difference in
distribution of __ for the
populations/treatments __ Ha: There is a
difference in distribution….
* Independence: Ho: There is no association
between _ and _. Ha: There IS an association
between _ and _.
*
* If reject the null, you should look at each of
the components in the sum for the chi-squared
statistic (i.e. each (o-e)2/e) and see which are
the largest.
* You should comment about which one or two
are largest and thus were the largest
contributors to rejecting Ho, i.e. were the
most different from expected.
*
* If you need to find an expected cell count “by
hand”:
* GOF:
Find out what proportion of the total count
that category is supposed to be (like 30% of M&M’s
are supposed to be yellow) and then take that
percent of the total to find expected count. Do not
round to whole numbers.
* Homogeneity/Independence:
Find totals/margins
for table. Then, find what percent of total table is
in the category of interest for whole table. Then,
take that percent of the column of interest and that
is the expected cell count.
*
```