### P. STATISTICS LESSON 13 – 2 (DAY 2)

```AP STATISTICS
LESSON 13 – 2
(DAY 2)
CHI-SQUARE USED TO TEST
HOMOGENEITY OF
POPULATIONS AND
ASSOCIATION/INDEPENDENCE
ESSENTIAL QUESTION:
How is Chi-square used to test
homogeneity and
association/distribution?
Objectives:
• To use the chi-square tests to test for
homogeneity.
• To use the chi-square tests to test for
association/independence.
The Chi-square Test for
Homogeneity of Populations
Comparing the sample proportions of
success describes the differences among the
three treatments for cocaine addiction. But
the statistical test that tells us whether those
differences are statistically significant
doesn’t use the sample proportions. It
compares the observed and expected counts.
Chi-square statistic
The chi-square statistic is a measure of how far the
observed counts in a two-way table are from the
expected counts. The formula for the statistic is
X2 = ∑ ( observed count – expected count)2
expected count
The sum is over all r x c cells in the table.
Characteristics of Chi-Square
The chi-square is a sum of terms, one for each cell in
the table.
As in the test for goodness of fit, you should think of
the chi-square statistic X2 as a measure of the distance
of the observed counts from the expected counts.
Like any distance, it is always zero or positive.
Although the alternative hypothesis Ha is many-sided,
the chi-square test is one-sided because any violation
of Ho tends to produce a large value of X2 .
Small values of X2 are not evidence against Ho .
Chi-square Test for Homogeneity of
Populations
• Select independent SRSs from each of c
populations. Classify each individual in a sample
according to a categorical response variable with r
possible values. There are c different sets of
proportions to be compared, one for each
population.
• The null hypothesis is that the distribution of the
response variable is the same in all c populations.
The alternative hypothesis says that these c
distributions are not all the same.
Chi-square Test for Homogeneity of
Populations (continued…)
• If Ho is true, the chi-square statistic X2 has
approximately a X2 distribution with
(r – 1)(c – 1) degrees of freedom (df).
• The P-value for the
chi-square test is the
area to the right of X2
under the chi-square
density curve with df
degrees of freedom.
Cell Counts Required for the
Chi-Square Test
You can safely use the chi-square test with
critical values from the chi-square distribution
when no more than 20% of the expected
counts are less than 5 and all individual
expected counts are 1 or greater. In
particular, all four expected counts of a 2 x 2
table should be 5 or greater.
Example 13.7 Page 752
Is Desipramine Effective in Treating
df = (r-1)(c-1) = 3-1)(2-1) = 2
• Look in the df = 2 row of Table E. The value X2 =
10.5 falls between the 0.01 and 0.005 critical
values of the chi-square distribution with 2
degrees of freedom.
• Remember that the chi-square test is always onesided. So the P-value of X2 = 10.5 is between
0.01 and 0.005.
Calculating chi-square With
Technology
Calculating the expected counts and then the
chi-square statistic by hand is a bit timeconsuming.
Computers and calculators save time and get
the math right.
Chi-square Tests with Minitab
• We enter the two-way table.
• Minitabs repeats the two-way table of
observations and puts expected count for each cell
below the observed count.
• Minitabs requires us to ask for the probability of a
value of 10.5 or smaller. This probability is
0.9948(at the bottom of the output), so the P-value
is 1 – 0.9948 = .0052
Using the Calculator for X2
Page 754
Chi-square tests on the TI-83/89
Follow-up analysis
The chi-square test is the overall test for
comparing any number of population
proportions. If the test allows us to reject
the null hypothesis that all the proportions
are equal, we then want to do a follow-up
analysis that examines the differences in
detail.
Example 13.8 Page 755
A Final Look at the Cocaine Study
The cocaine study found significant
differences among the proportions of
successes for the tree treatments for cocaine
addition. We can see the diffences in three
ways:
Look at the sample proportions:
^
P1 = 0583
^
P2 = 0.250
^
P3 = 0.167
The Chi-square Test of
Association/Independence
The cocaine study is an experiment that assigned
24 addicts to each of three groups.
Each group is a sample from a separate population
corresponding to a separate treatment.
The null hypothesis of “no difference” among
treatments takes the form of “equal proportions of
successes” in three populations.
Example 13.9
Page 758
Smoking and SES
The two-way table 13.9 does not compare several
populations. Instead, it arises by classifying
observations from a single population in two ways: by
smoking habits and SES.
Observed counts for smoking and SES
SES
Smoking
High
Middle
Low
Total
Current
51
22
43
116
Former
92
21
28
141
Never
68
9
22
99
Total
211
52
93
356
Example 13.9 (continued…)
Both of these variables have three levels, so careful
statement of the null hypothesis
H0: there is no association between SES and smoking
habits
One of the most useful properties of chi-square is
that it tests the hypothesis “the row and column
variables are not related to each other” whenever
this hypothesis makes sense for a two-way table.
The Chi-Square Test of
Association/Independence
Use the chi-square test of
association/independence to test the null
hypothesis
Ho : there is no relationship between
two categorical variables
When you have a two-way table from a single SRS,
with each individual classified according to both
of two categorical variables.
Example 13.10 Page 759
Smoking Habits in Each SES Category
We must calculate the column percents. For the
high-SES group, there are 51 current smokers out
of a total of 211 people (24.2%).
Similarly, 92 of the 211 people in this group are
former smokers (43.6%).
Overall, the column percents suggest that there is a
a negative association between smoking and SES:
higher-SES people tend to smoke less.
Example 13.11
Expected cell counts
Page 760
Figure 13.6
(Comparisons from Example 13.10)
Example 13.12 Page 762
Chi-square Test for Association/Independence
There is strong evidence
X2 – 18.51, df = 4 P 0.001
of an association between smoking and SES
in the population of male federal
employees.
Concluding Remarks
In the test of association/independence,
there is a single sample from a single
population. The individuals in the sample
are classified according to two categorical
variables.
```