Report

STT 200 – LECTURE 5, SECTION 23,24 RECITATION 12 (4/2/2013) TA: Zhen Zhang [email protected] Office hour: (C500 WH) 3-4 PM Tuesday (office tel.: 432-3342) Help-room: (A102 WH) 9:00AM-1:00PM, Monday 1 Class meet on Tuesday: 12:40 – 1:30PM A224 WH, Section 23 1:50 – 2:40PM A234 WH, Section 24 Example (sampling distribution) Recall that the data we have last time contain “yes/no” responses from a population of 400 persons who were asked if they have wireless internet access at home. The population proportion of “yes” is 0.5575. We draw many random samples with size n = 37, the sampling distribution of can be approximated by ~ , (1−) What if we don’t know ? 2 p = 0.5575 0.3 0.4 0.5 0.6 ^ p 0.7 0.8 Example (confidence interval) To study , we draw sample with size n = 37, obtain and construct 95% confidence interval using ± ∗ We are 95% confident that is between it. The “95% confident” means if we draw samples and construct intervals many times, approximately 95% intervals will cover . 0.4 0.6 0.8 (1−) . 0.2 3 Example (check conditions) To validate the confidence interval, we need to check several conditions: Independence condition: the n = 37 responses in the sample are chosen independently. Randomness condition: the n = 37 responses in the sample are chosen randomly. We used the table of random digits from 1 to 400. 10% condition: the sample size n = 37 is less than 10% of the population size 400. Success/failure condition: the n = 37 responses in the sample contains at least 10 yeses and 10 nos. 4 Construct confidence interval step by step To construct confidence interval for with confidence level C: Determine the critical value ∗ , either using Normal table, or R/calculator. To use R/calculator, note the total area below ∗ is C + 1− 2 = +1 , 2 so we can find ∗ using qnorm((C+1)/2) in R, or invnorm((C+1)/2) in DISTR in a Ti-83 Plus calculator. For example in a calculator: for 95% confidence, ∗ = invnorm((0.95+1)/2)=1.96 for 90% confidence, ∗ = invnorm((0.90+1)/2)=1.645 (1−) Find = The margin of error is The confidence interval is ± = ( − , + ) = ∗ = ∗ (1−) 5 Understand confidence interval backwards If a 95% confidence interval for is (0.6184, 0.8616), can you figure out what is , what is the margin of error, and what is the sample size? Ans. is the middle point of this interval, or, the average of the two endpoints, so 0.6184 + 0.8616 = = 0.74 2 and the margin of error is half of the width, or |endpoint-middle point| = 0.8616 − 0.74 = 0.74 − 0.6184 = 0.1216 Now since = ∗ = (1−) , ∗ 2 (1 − ) 2 = we have 1.962 ∗ 0.74 ∗ 0.26 0.12162 = 50 6 Relationship Margin of error = ∗ (1−) determines the width of the confidence interval. The following simulation shows the relationship between and each of ∗ , and when other two fixed. margin of error For example, if the confidence level C increases, ∗ will increase, so will increase, and the confidence interval is wider. p = 0.5 0.0 0.2 0.4 ^ p 0.6 0.8 1.0 fixed confidence level = 95%, n = 100 40 60 80 100 120 sample size n 140 fixed confidence level = 95%, p = 0.5 0.80 0.85 0.90 0.95 confidence level C fixed n = 100, p = 0.5 7 Sample size determination Recall that from = ∗ (1−) , we have: ∗ 2 (1 − ) = 2 and want to determine the sample size of the data we will collect. We need to guess . If “it is believed” or some “national study” gives a value for population proportion , we can use it and replace . We can also use from our pilot sample if we have. If we totally have no idea about , we can use a conservative guess based on the “worst” scenario, that is, when (1 − ) reaches its maximal (when = 0.5), it corresponds to the largest required sample size. 8 NEED SOME COFFEE? 9 Chapter 19 (Page 504): #7: Which statements are true? a) For a given sample size, higher confidence means a smaller margin of error. b) For a specified confidence level, larger samples provides smaller margins of error. c) For a fixed margin of error, larger samples provide greater confidence. d) For a given confidence level, halving the margin of error requires a sample twice as large. 10 Chapter 19 (Page 504): #8: Which statements are true? a) For a given sample size, reducing the margin of error will mean lower confidence. b) For a certain confidence level, you can get a smaller margin of error by selecting a bigger sample. c) For a fixed margin of error, smaller samples will mean lower confidence. d) For a given confidence level, a sample 9 times as large will make a margin of error one third as big. 11 Chapter 19 (Page 505): #14: 11% of a random sample of 1003 adults approved of attempts to clone a human. a) Find the margin of error if we want 95% confidence. = ∗ b) (1 − ) 0.11 × 0.89 = 1.96 × 1003 = 0.0194 Explain what that margin of error means. The pollsters are 95% confident that the true population of adults who approve of attempts to clone humans is within 1.9% of the estimated 11%. c) If we only need to be 90% confident, will the margin of error be larger or smaller? Explain. Smaller, since the critical value ∗ decreases as confidence level decreases. d) Find that margin of error. = ∗ a) (1 − ) = 1.645 × 0.11 × 0.89 1003 = 0.0163 In general, if all other aspects of the situation remain the same, would smaller samples produce smaller or larger margin of error? Larger. 12 Chapter 19 (Page 506): #27: In a random survey of 226 college students, 20 reported being “only” children. Estimate the proportion of students nationwide. a) Check conditions for constructing a confidence interval. The students’ birth orders are likely to be independent. The sample was random and consisted of less than 10% of the population. There were 20 successes and 206 failures (both greater than 10). b) Construct 95% confidence interval. = 20 = 0.0885, 226 = 1.96 (1 − ) 226 = 0.0370. Hence the confidence interval is 0.0885 ± 0.0370 = (0.0515, 0.1255). c) Interpret your interval. We are 95% confident that between 5.15% and 12.55% of all college students are “only” children. d) Explain what “95% confidence” means in this context. If we were to select repeated samples like this we’d expect about 95% of the confidence intervals we created to contain the true proportion of all college students who are “only” children. 13 Chapter 19 (Page 506): #28: 74% of 1644 randomly selected college freshmen returned to college the next year. Estimate the national freshman-to-sophomore retention rate. a) Verify that the conditions are met. It’s a random sample; both 74% and 26% of 1644 are greater than 10. b) Construct a 98% confidence interval. The critical value is invnorm((1+0.98)/2) = 2.326, hence the margin of error = 2.326*sqrt(0.74*0.26/1644)=0.0252. Hence the confidence interval is 0.74 ± 0.0252 = (0.7148,0.7652) c) Interpret your interval. We’re 98% confident that between 71.48% and 76.52% of colleges freshman return to college their sophomore years. d) Explain what “98% confidence” means in this context. If we were to select repeated samples like this we’d expect about 98% of the confidence intervals we created to contain the true proportion of all college freshmen who return to be sophomores. 14 Sample size determination In a University, it’s believed that 25% of adults over 30 love Statistics. We wish to see if this percentage is the same among the 18 to 25 age group. a) How many of this younger age group must we survey in order to estimate the proportion of those who love Statistics to within 5% with 90% confidence? With 90% confidence, the critical value ∗ = 1.645. Thus ∗ 2 (1 − ) 1.6452 × 0.25 × (1 − 0.25) = = = 202.9519 2 0.052 So the required sample size is 203. b) If we want to cut the margin of error in half, how many of this younger age group must we survey? Do you have any concerns about this sample? Explain. ∗ 2 (1 − ) 1.6452 × 0.25 × (1 − 0.25) = = = 811.8075 2 0.0252 So the required sample size is 812. This large sample might be larger than 10% of the population. 15 APPENDIX 1 R codes for example: # please import the data we had in recitation 11 slide, otherwise it won’t work haswi <- c("Yes","Yes","Yes","No","Yes","No","Yes","No“ ... ... p <- mean(haswi=="Yes"); N <- length(haswi); n <- 37; replica <- 10000 set.seed(241) phats <- numeric(replica) interval <- matrix(0, replica, 2) zstar <- qnorm((1+0.95)/2) for (t in 1:replica){ mysamples <- haswi[sample(1:N, size=n)] ph <- sum(mysamples=="Yes")/n; moe <- zstar*sqrt(ph*(1-ph)/n) phats[t] <- ph interval[t,] <- c(ph-moe, ph+moe) } phats <- na.omit(phats) win.graph(w=12,h=6) par(xaxt='n',mar=c(.8,2,.8,.8)); B <- 100 plot(1:B, ylim=range(interval[1:B,])+1*c(-.01,.01),type='n',ylab='',xlab=''); grid(col='gray60') abline(h=p, col='red',lwd=2) for(t in 1:B){ lines(x=c(t,t), y=interval[t,],col='gray40',lwd=2) lines(x=t+c(-.3,.2), y=rep(interval[t,1],2),col='gray40',lwd=2) lines(x=t+c(-.3,.2), y=rep(interval[t,2],2),col='gray40',lwd=2) points(x=t, y=mean(interval[t,]), pch=16, cex=.8,col='blue2') } mean(p>=interval[1:B,1] & p<=interval[1:B,2]) 16 APPENDIX 2 R codes for the simulation study of finding relationship between margin of error and sample proportion, sample size and confidence level. a = function(p=0.5,z=0.95,n=100) return(qnorm((1+z)/2)*sqrt(p*(1-p)/n)) ps = seq(0,1,length=1e3) win.graph(w=9,h=4) par(mfrow=c(1,3), mar=c(4,4,0,0)+1, cex.lab=2, yaxt='n', cex.sub=1.3) plot(a(ps)~ps, type='l', xlab=expression(hat(p)),ylab='margin of error', lwd=2, sub="fixed confidence level = 95%, n = 100"); grid(col='gray70') abline(v=0.5, col='red2',lwd=2); text(y=0,x=0.65,labels="p = 0.5",col='red2',cex=1.2) ns <- seq(30,150,by=1) plot(a(n=ns)~ns, type='l', xlab='sample size n',ylab='', lwd=2, sub="fixed confidence level = 95%, p = 0.5"); grid(col='gray70') zs <- seq(0.8,0.99,length=1e3) plot(a(z=zs)~zs, type='l', xlab='confidence level C',ylab='', lwd=2, sub="fixed n = 100, p = 0.5"); grid(col='gray70') 17 Thank you. 18