Chi-square Test of Independence Presentation 10.2 Another Significance Test for Proportions • But this time we want to test multiple variables. • With this test we can determine if two variables are independent of not. • This is sometimes called inference for twoway tables. Chi-square Test of Independence Formulas The null and alternate hypotheses are always the same with a Test of Independence. Null Hypothesis (assumes independent) Alternate Hypothesis (not independent) Test Statistic (that symbol is called “Chi-squared”) H 0 : Observed Expected H a : Observed Expected 2 O E 2 E df # of rows 1 # of columns 1 P Value cdf ,9999 , df 2 Instead of a normal or t distribution, we now have a chisquared distribution 2 O is the observed count for each cell in the table and E is the expected count for each cell in the table. The Titanic • Look at the data of the passengers, their ticket, and whether or not they survived. Type of Ticket First Class Second Class Third Class Rescued 203 118 528 Died 123 167 178 Conditions for the Test of Independence • None of the observed counts should be less than 1 • No more than 20% of the counts should be less than 5 – Same as for the Goodness of Fit test • These are simple checks to make sure that the sample size is sufficient. The Titanic • Check the conditions – Since all counts are much greater than 5, we are ok to conduct the test • Write Hypotheses (these are always the same!) – Null: Ho: Observed = Expected • That is, what we observed should be the same as what we expected given the variables are independent – Alternate: Ha: Observed ≠ Expected • That is, the observed data is just too different from what is expected to be attributed to random chance. The Titanic Calculations • Find the expected values (assume independence) Observed Type of Ticket Rescu ed Died Totals First Class 203 123 326 Secon d Class 118 167 285 Third Class 528 Totals 849 178 468 Expected Type of Ticket Rescued Died Totals First Class 326*849/1317= 210 326*468/1317= 116 326 Second Class 285*849/1317= 184 285*468/1317= 101 285 Third Class 706*849/1317= 455 706*468/1317= 251 706 Totals 849 468 1317 706 1317 To find an expected count, 849 out of 1317 total passengers were rescued (64.46%), so 849/1317 or 64.46% of the 326 first class passengers should have been rescued. This logic follows for each cell in the table. The Titanic Calculations • Then, do the sum of just like with the Goodness of Fit Test • Our degrees of freedom are: • Finally, use chisquare cdf: X2cdf(99.69,99999,2) 2 O E 2 E df rows 1 columns 1 df 3 1 2 1 df 2 The Titanic Calculations • Using the calculator • First go to the Matrix menu (2nd x-1) • Go to edit and press enter • Enter the number of row x column – Your matrix should fit the look of your table • Enter in the data – Make the calculator match the table • Then go to your stats tests and choose chi-test The Titanic Calculations • Using the calculator • Since you entered the data into matrix [A], you can just go right to: – Calculate – Draw • Leave the expected alone as the calculator will calculate those for you (see next slide) The Titanic Calculations • Using the calculator • Let’s go check out the expected table – Go back to matrix – Edit [B] to see the values • How cool is that! The Titanic Calculations • Conclusions – The p-value represents the chance of the data occurring given the variables are independent. – For the Titanic, this was a 0.00000000000000000002% chance – REJECT THE NULL! – There is a ton of evidence to suggest that there is an association between survival rate and the type of ticket. Chi-square Goodness of Fit Test This concludes this presentation.