Chapter 19 – Confidence Intervals for Proportions Next few chapters What percentage of adults own smartphones? What is the average SAT score of Baltimore County students? Do a higher percentage of women vote for Democrats than men? Do cars who use a fuel additive get better fuel efficiency? Is it true that 30% of our students work part-time? Does the average American eat more than 4 meals out per week? Confidence Intervals & Hypothesis Tests For the remainder of the semester we are going to focus on confidence intervals and hypothesis tests Confidence Interval: range of values we predict the true population statistic is within Hypothesis Test: determine whether or not a claim made about a population statistic is valid Gallup/Harris Polls Gallup: The percentage of Americans reporting they ate healthy all day "yesterday" declined to 66.1% in 2011 from 67.7% in 2010 Nielsen: Almost half (49.7%) of U.S. mobile subscribers now own smartphones, as of February 2012. Harris: Currently one in five U.S. adults has at least one tattoo (21%) which is up from the 16% and 14% who reported having a tattoo when this question was asked in 2003 and 2008, respectively. Confidence Intervals We use a sample to make our prediction about the population Since each sample we take will give us a slightly different estimate, we have to understand the random sampling variation we’ve been studying We can never be precise about our estimate, but we can put it within a range of values we feel confident about Width of Confidence Interval and Confidence Level Sample Size: Should we be more or less confident in our estimate as sample size increases? Confidence Level: Should we expect a wider or narrower interval as our confidence increases? Each interval will have a Margin of Error that takes all of this into account Based on sample size and confidence level Proportion Estimates We saw in the last chapter that when we use a sample to estimate a proportion that the proportion estimates were distributed Normally with: ( p) p SD ( p) pq n We know our estimate p is just an estimate, but want to know how good an estimate it is Standard Error We can use the standard deviation of our sampling distribution model and our proportion estimate to find the Standard Error: SE ( p) pq n We can use this error to get a sense for how confident we are that our estimate is correct It is not a mistake we made, but a way to measure the random sampling variation, and since we don’t have the population proportion, we can’t know the s.d. 21% of adults have tattoos Sample was from 2,016 adults, 423 of which had tattoos. Since this is just one sample, let’s look at the sampling distribution model like we did last chapter: model mean: SE: What can we say? 21% of all adults have tattoos? No, this was only 1 sample of 2,016 people It’s likely that 21% of all adults have tattoos? No, again, with only 1 sample, we’re pretty sure this isn’t the actual proportion While we can’t be sure of the actual proportion of adults with tattoos, we’re sure it’s between and 19.2% and 22.8% We can’t know for sure what the actual proportion is, but this at least shows some of the uncertainty we have What we can really say We’re pretty sure that the actual proportion of adults that have tattoos is contained in the interval from 19.2% and 22.8% We are, in fact, 95% confident that between 19.2% and 22.8% of adults have tattoos. 95% confidence uses 2 SD’s as in our 68-95-99.7 Rule This is a Confidence Interval which we will usually write in interval notation: (.192, .228) Example: Legal Music A random sample of 168 students were asked about their digital music library. Overall, out of 117,709 songs, 23.1% were legal. Construct a 95% confidence interval for the fraction of legal digital music. What does the Confidence Interval really mean? Technically, a 95% confidence interval means that 95% of all samples of the same given size will include the true population proportion. This represents confidence intervals of 20 simulated samples for the sea fans infected from the example in the text. You can see that most of the confidence intervals include the true proportion. Figure from DeVeaux, Intro to Stats Certainty vs. Precision If you were going to guess someone’s height, would you be more likely to be right with a wider or smaller range for your guess? The larger the margin of error you have, the more likely your prediction is to be correct. The more precise we want to be, the less confident we can be that we are correct. Margin of Error (ME) Our 95% confidence interval used: p ± 2 SE( p ) We can always think of a confidence interval as: Estimate ± ME Margin of Error is based on the level of confidence. We used 2 SE for our margin of error based on the 68-95-99.7 rule. Critical Values While 2 is a good estimate for a 95% confidence interval, using the Normal probability table, we can see that z*= 1.96 is more accurate. What would be the critical value for a 92% confidence interval? 92% Confidence Interval 4% 92 % 4% Use Table in Appendix D to find appropriate z-score. Calculating Margin of Error Using our earlier example involving tattoos, what would the margin of error be for a 92% confidence interval? SE ( p) (.2 1 )(.7 9 ) 2016 .0 0 9 Now we also know for a 92% confidence interval, we use z* = 1.75 ME = 1.75(.009) = .016 (ME for 95% CI: .018) Assumptions/Conditions Independence Assumption Randomization Condition 10% Condition Sample Size Assumption We will need more data as proportion gets closer to 0 or 1 Success/Failure Condition One Proportion Z-Interval When conditions are met Confidence Interval = p z * pq n Make sure you can interpret your confidence interval. Confidence Interval Example A Gallup poll shows that 62% of Americans would amend the Constitution to use the popular vote for Presidential elections instead of the electoral vote. They used a random sample of 1,005 adults aged 18+ Verify that the conditions were met. Construct a 95% confidence interval. Interpret your interval. http://www.gallup.com/file/poll/150272/Americans_Po pular_Vote_Not_Electoral_College_111024%20.pdf Choosing Sample Size As we pick a larger sample, we should expect our margin of error to go down. Why? ME z * pq n If we know our desired Margin of Error, we can solve for n to get our sample size Always round up to next integer If we don’t know p then we use p = 0.5 to max error Sample Size Example If we find from a pilot study that 32% of Math 153 students are full-time students, how many students would we have to sample to estimate the proportion of Math 153 full-time students to within 7% with 90% confidence?