AP STATISTICS LESSON 13 – 2 (DAY 2) CHI-SQUARE USED TO TEST HOMOGENEITY OF POPULATIONS AND ASSOCIATION/INDEPENDENCE ESSENTIAL QUESTION: How is Chi-square used to test homogeneity and association/distribution? Objectives: • To use the chi-square tests to test for homogeneity. • To use the chi-square tests to test for association/independence. The Chi-square Test for Homogeneity of Populations Comparing the sample proportions of success describes the differences among the three treatments for cocaine addiction. But the statistical test that tells us whether those differences are statistically significant doesn’t use the sample proportions. It compares the observed and expected counts. Chi-square statistic The chi-square statistic is a measure of how far the observed counts in a two-way table are from the expected counts. The formula for the statistic is X2 = ∑ ( observed count – expected count)2 expected count The sum is over all r x c cells in the table. Characteristics of Chi-Square The chi-square is a sum of terms, one for each cell in the table. As in the test for goodness of fit, you should think of the chi-square statistic X2 as a measure of the distance of the observed counts from the expected counts. Like any distance, it is always zero or positive. Although the alternative hypothesis Ha is many-sided, the chi-square test is one-sided because any violation of Ho tends to produce a large value of X2 . Small values of X2 are not evidence against Ho . Chi-square Test for Homogeneity of Populations • Select independent SRSs from each of c populations. Classify each individual in a sample according to a categorical response variable with r possible values. There are c different sets of proportions to be compared, one for each population. • The null hypothesis is that the distribution of the response variable is the same in all c populations. The alternative hypothesis says that these c distributions are not all the same. Chi-square Test for Homogeneity of Populations (continued…) • If Ho is true, the chi-square statistic X2 has approximately a X2 distribution with (r – 1)(c – 1) degrees of freedom (df). • The P-value for the chi-square test is the area to the right of X2 under the chi-square density curve with df degrees of freedom. Cell Counts Required for the Chi-Square Test You can safely use the chi-square test with critical values from the chi-square distribution when no more than 20% of the expected counts are less than 5 and all individual expected counts are 1 or greater. In particular, all four expected counts of a 2 x 2 table should be 5 or greater. Example 13.7 Page 752 Is Desipramine Effective in Treating Cocaine Addiction? df = (r-1)(c-1) = 3-1)(2-1) = 2 • Look in the df = 2 row of Table E. The value X2 = 10.5 falls between the 0.01 and 0.005 critical values of the chi-square distribution with 2 degrees of freedom. • Remember that the chi-square test is always onesided. So the P-value of X2 = 10.5 is between 0.01 and 0.005. Calculating chi-square With Technology Calculating the expected counts and then the chi-square statistic by hand is a bit timeconsuming. Computers and calculators save time and get the math right. Chi-square Tests with Minitab • We enter the two-way table. • Minitabs repeats the two-way table of observations and puts expected count for each cell below the observed count. • Minitabs requires us to ask for the probability of a value of 10.5 or smaller. This probability is 0.9948(at the bottom of the output), so the P-value is 1 – 0.9948 = .0052 Using the Calculator for X2 Page 754 Chi-square tests on the TI-83/89 Follow-up analysis The chi-square test is the overall test for comparing any number of population proportions. If the test allows us to reject the null hypothesis that all the proportions are equal, we then want to do a follow-up analysis that examines the differences in detail. Example 13.8 Page 755 A Final Look at the Cocaine Study The cocaine study found significant differences among the proportions of successes for the tree treatments for cocaine addition. We can see the diffences in three ways: Look at the sample proportions: ^ P1 = 0583 ^ P2 = 0.250 ^ P3 = 0.167 The Chi-square Test of Association/Independence The cocaine study is an experiment that assigned 24 addicts to each of three groups. Each group is a sample from a separate population corresponding to a separate treatment. The null hypothesis of “no difference” among treatments takes the form of “equal proportions of successes” in three populations. Example 13.9 Page 758 Smoking and SES The two-way table 13.9 does not compare several populations. Instead, it arises by classifying observations from a single population in two ways: by smoking habits and SES. Observed counts for smoking and SES SES Smoking High Middle Low Total Current 51 22 43 116 Former 92 21 28 141 Never 68 9 22 99 Total 211 52 93 356 Example 13.9 (continued…) Both of these variables have three levels, so careful statement of the null hypothesis H0: there is no association between SES and smoking habits One of the most useful properties of chi-square is that it tests the hypothesis “the row and column variables are not related to each other” whenever this hypothesis makes sense for a two-way table. The Chi-Square Test of Association/Independence Use the chi-square test of association/independence to test the null hypothesis Ho : there is no relationship between two categorical variables When you have a two-way table from a single SRS, with each individual classified according to both of two categorical variables. Example 13.10 Page 759 Smoking Habits in Each SES Category We must calculate the column percents. For the high-SES group, there are 51 current smokers out of a total of 211 people (24.2%). Similarly, 92 of the 211 people in this group are former smokers (43.6%). Overall, the column percents suggest that there is a a negative association between smoking and SES: higher-SES people tend to smoke less. Example 13.11 Expected cell counts Page 760 Figure 13.6 (Comparisons from Example 13.10) Example 13.12 Page 762 Chi-square Test for Association/Independence There is strong evidence X2 – 18.51, df = 4 P 0.001 of an association between smoking and SES in the population of male federal employees. Concluding Remarks In the test of association/independence, there is a single sample from a single population. The individuals in the sample are classified according to two categorical variables.