### Week12slides

```STT 200 – LECTURE 5, SECTION 23,24
RECITATION 12
(4/2/2013)
TA: Zhen Zhang
[email protected]
Office hour: (C500 WH) 3-4 PM Tuesday
(office tel.: 432-3342)
Help-room: (A102 WH) 9:00AM-1:00PM, Monday
1
Class meet on Tuesday:
12:40 – 1:30PM A224 WH, Section 23
1:50 – 2:40PM A234 WH, Section 24
 Example
(sampling distribution)

Recall that the data we have last time contain “yes/no” responses from
a population of 400 persons who were asked if they have wireless
internet access at home. The population proportion of “yes”  is 0.5575.

We draw many random samples
with size n = 37, the sampling distribution
of  can be approximated by
~  ,

(1−)

What if we don’t know ?
2
p = 0.5575
0.3
0.4
0.5
0.6
^
p
0.7
0.8
 Example

(confidence interval)
To study , we draw sample with size n = 37, obtain  and construct
95% confidence interval using  ±  ∗
We are 95% confident
that  is between it.
The “95% confident” means if we draw samples and construct intervals
many times, approximately 95% intervals will cover .
0.4
0.6
0.8

(1−)
.

0.2
3
 Example
(check conditions)
To validate the confidence interval, we need to check several conditions:

Independence condition: the n = 37 responses in the sample are
chosen independently.

Randomness condition: the n = 37 responses in the sample are
chosen randomly. We used the table of random digits from 1 to 400.

10% condition: the sample size n = 37 is less than 10% of the
population size 400.

Success/failure condition: the n = 37 responses in the sample
contains at least 10 yeses and 10 nos.
4
 Construct
confidence interval step by step
To construct confidence interval for  with confidence level C:

Determine the critical value  ∗ , either using Normal table, or R/calculator.
To use R/calculator, note the total area below  ∗ is C +
1−
2
=
+1
,
2
so we can
find  ∗ using qnorm((C+1)/2) in R, or invnorm((C+1)/2) in DISTR in a Ti-83
Plus calculator. For example in a calculator:

for 95% confidence,
∗ = invnorm((0.95+1)/2)=1.96

for 90% confidence,
∗ = invnorm((0.90+1)/2)=1.645
(1−)


Find   =

The margin of error is

The confidence interval is  ±  = ( − ,  + )
=  ∗   =  ∗
(1−)

5
 Understand

confidence interval backwards
If a 95% confidence interval for  is (0.6184, 0.8616), can you figure
out what is , what is the margin of error, and what is the sample size?
Ans.  is the middle point of this interval, or, the average of the two
endpoints, so
0.6184 + 0.8616
=
= 0.74
2
and the margin of error is half of the width, or |endpoint-middle point|
= 0.8616 − 0.74  = 0.74 − 0.6184 = 0.1216
Now since  =  ∗
=
(1−)
,

∗ 2 (1 − )
2
=
we have
1.962 ∗ 0.74 ∗ 0.26
0.12162
= 50
6
 Relationship
Margin of error  =  ∗
(1−)

determines the width of the confidence
interval. The following simulation shows the relationship between  and
each of  ∗ ,  and  when other two fixed.
margin of error
For example, if the confidence level C increases,  ∗ will increase, so  will
increase, and the confidence interval is wider.
p = 0.5
0.0
0.2
0.4
^
p
0.6
0.8
1.0
fixed confidence level = 95%, n = 100
40
60
80
100
120
sample size n
140
fixed confidence level = 95%, p = 0.5
0.80
0.85
0.90
0.95
confidence level C
fixed n = 100, p = 0.5
7

Sample size determination
Recall that from  =  ∗
(1−)
,

we have:
∗ 2 (1 − )
=
2
and want to determine the sample size  of the data we will collect. We need
to guess .

If “it is believed” or some “national study” gives a value for population
proportion , we can use it and replace .

We can also use  from our pilot sample if we have.

If we totally have no idea about , we can use a conservative guess based
on the “worst” scenario, that is, when (1 − ) reaches its maximal (when
= 0.5), it corresponds to the largest required sample size.
8
NEED SOME COFFEE?
9

Chapter 19 (Page 504): #7:
Which statements are true?
a)
For a given sample size, higher confidence means a smaller margin
of error.
b)
For a specified confidence level, larger samples provides smaller
margins of error.
c)
For a fixed margin of error, larger samples provide greater
confidence.
d)
For a given confidence level, halving the margin of error requires a
sample twice as large.
10

Chapter 19 (Page 504): #8:
Which statements are true?
a)
For a given sample size, reducing the margin of error will mean
lower confidence.
b)
For a certain confidence level, you can get a smaller margin of
error by selecting a bigger sample.
c)
For a fixed margin of error, smaller samples will mean lower
confidence.
d)
For a given confidence level, a sample 9 times as large will make a
margin of error one third as big.
11

Chapter 19 (Page 505): #14:
11% of a random sample of 1003 adults approved of attempts to clone a human.
a)
Find the margin of error if we want 95% confidence.
=  ∗
b)
(1 − )
0.11 × 0.89
= 1.96 ×
1003 = 0.0194
Explain what that margin of error means.
The pollsters are 95% confident that the true population of adults who approve of
attempts to clone humans is within 1.9% of the estimated 11%.
c)
If we only need to be 90% confident, will the margin of error be larger or
smaller? Explain.
Smaller, since the critical value  ∗ decreases as confidence level decreases.
d)
Find that margin of error.
=  ∗
a)
(1 − )
= 1.645 ×
0.11 × 0.89
1003 = 0.0163
In general, if all other aspects of the situation remain the same, would smaller
samples produce smaller or larger margin of error?
Larger.
12

Chapter 19 (Page 506): #27:
In a random survey of 226 college students, 20 reported being “only” children.
Estimate the proportion of students nationwide.
a)
Check conditions for constructing a confidence interval.
The students’ birth orders are likely to be independent. The sample was random and
consisted of less than 10% of the population. There were 20 successes and 206 failures
(both greater than 10).
b)
Construct 95% confidence interval.
=
20
= 0.0885,
226
= 1.96
(1 − )
226 = 0.0370.
Hence the confidence interval is 0.0885 ± 0.0370 = (0.0515, 0.1255).
c)
Interpret your interval.
We are 95% confident that between 5.15% and 12.55% of all college students are “only”
children.
d)
Explain what “95% confidence” means in this context.
If we were to select repeated samples like this we’d expect about 95% of the confidence
intervals we created to contain the true proportion of all college students who are “only”
children.
13

Chapter 19 (Page 506): #28:
74% of 1644 randomly selected college freshmen returned to college the next
year. Estimate the national freshman-to-sophomore retention rate.
a)
Verify that the conditions are met.
It’s a random sample; both 74% and 26% of 1644 are greater than 10.
b)
Construct a 98% confidence interval.
The critical value is invnorm((1+0.98)/2) = 2.326, hence the margin of error =
2.326*sqrt(0.74*0.26/1644)=0.0252.
Hence the confidence interval is 0.74 ± 0.0252 = (0.7148,0.7652)
c)
Interpret your interval.
We’re 98% confident that between 71.48% and 76.52% of colleges freshman
return to college their sophomore years.
d)
Explain what “98% confidence” means in this context.
If we were to select repeated samples like this we’d expect about 98% of the
confidence intervals we created to contain the true proportion of all college
freshmen who return to be sophomores.
14

Sample size determination
In a University, it’s believed that 25% of adults over 30 love Statistics. We wish to
see if this percentage is the same among the 18 to 25 age group.
a)
How many of this younger age group must we survey in order to estimate the
proportion of those who love Statistics to within 5% with 90% confidence?
With 90% confidence, the critical value  ∗ = 1.645. Thus
∗ 2 (1 − ) 1.6452 × 0.25 × (1 − 0.25)
=
=
= 202.9519
2
0.052
So the required sample size is 203.
b)
If we want to cut the margin of error in half, how many of this younger age
group must we survey? Do you have any concerns about this sample? Explain.
∗ 2 (1 − ) 1.6452 × 0.25 × (1 − 0.25)
=
=
= 811.8075
2
0.0252
So the required sample size is 812.
This large sample might be larger than 10% of the population.
15
APPENDIX 1
 R codes for example:
# please import the data we had in recitation 11 slide, otherwise it won’t work
haswi <- c("Yes","Yes","Yes","No","Yes","No","Yes","No“ ... ...
p <- mean(haswi=="Yes"); N <- length(haswi); n <- 37; replica <- 10000
set.seed(241)
phats <- numeric(replica)
interval <- matrix(0, replica, 2)
zstar <- qnorm((1+0.95)/2)
for (t in 1:replica){
mysamples <- haswi[sample(1:N, size=n)]
ph <- sum(mysamples=="Yes")/n; moe <- zstar*sqrt(ph*(1-ph)/n)
phats[t] <- ph
interval[t,] <- c(ph-moe, ph+moe)
}
phats <- na.omit(phats)
win.graph(w=12,h=6)
par(xaxt='n',mar=c(.8,2,.8,.8));
B <- 100
plot(1:B, ylim=range(interval[1:B,])+1*c(-.01,.01),type='n',ylab='',xlab='');
grid(col='gray60')
abline(h=p, col='red',lwd=2)
for(t in 1:B){
lines(x=c(t,t), y=interval[t,],col='gray40',lwd=2)
lines(x=t+c(-.3,.2), y=rep(interval[t,1],2),col='gray40',lwd=2)
lines(x=t+c(-.3,.2), y=rep(interval[t,2],2),col='gray40',lwd=2)
points(x=t, y=mean(interval[t,]), pch=16, cex=.8,col='blue2')
}
mean(p>=interval[1:B,1] & p<=interval[1:B,2])
16
APPENDIX 2
 R codes for the simulation study of finding relationship between margin of
error and sample proportion, sample size and confidence level.
a = function(p=0.5,z=0.95,n=100) return(qnorm((1+z)/2)*sqrt(p*(1-p)/n))
ps = seq(0,1,length=1e3)
win.graph(w=9,h=4)
par(mfrow=c(1,3), mar=c(4,4,0,0)+1, cex.lab=2, yaxt='n', cex.sub=1.3)
plot(a(ps)~ps, type='l', xlab=expression(hat(p)),ylab='margin of error', lwd=2,
sub="fixed confidence level = 95%, n = 100"); grid(col='gray70')
abline(v=0.5, col='red2',lwd=2); text(y=0,x=0.65,labels="p = 0.5",col='red2',cex=1.2)
ns <- seq(30,150,by=1)
plot(a(n=ns)~ns, type='l', xlab='sample size n',ylab='', lwd=2,
sub="fixed confidence level = 95%, p = 0.5"); grid(col='gray70')
zs <- seq(0.8,0.99,length=1e3)
plot(a(z=zs)~zs, type='l', xlab='confidence level C',ylab='', lwd=2,
sub="fixed n = 100, p = 0.5"); grid(col='gray70')
17
Thank you.
18
```