### Chapter 7 - Cerritos College

```Chapter 7
Survey Sampling
and Inference
Learning Objectives




7- 2
Be able to estimate a population proportion from a
sample proportion and quantify how far off the
estimate is likely to be.
Understand that random sampling reduces bias.
Understand when the Central Limit Theorem for
sample proportions applies and know how to use it
to find approximate probabilities for sample
proportions.
Understand how to find, interpret, and use
confidence intervals for a single population
proportion.
7.1
World through Surveys
Survey Terminology





7- 4
The Population is the group of people or objects we
wish to study.
A Parameter is a numerical value that characterizes
some aspect of the population.
A Census is a survey in which every member of the
population is measured.
A Sample is a collection of people or objects taken
from the population.
A Statistic also called an Estimator is a number
derived from the data.
Statistical Inference
Statistical inference is the art and science of
drawing conclusions about a population on
the basis of observing only a small subset of
that population.
 Statistical inference always involves
uncertainty, so an important component of
this science is measuring our uncertainty.

7- 5
A survey asked 1000 US college students if
they preferred to study alone or with others.
420 preferred to study alone.





7- 6
The population is all US college students.
The sample is the 1000 students who were
surveyed.
The parameter of interest is p, the proportion of all
US college students who study alone.
The statistic pˆ  0.42 is the proportion of the 1000
students who study alone.
Statistical inference: We estimate that 42% of all
US college students prefer to study alone.
Bias
A method is Biased if it has a tendency to
produce an untrue value.
 Sampling Bias results from taking a sample
that is not representative of the population.



Convenience sampling and voluntary response
sampling
Measurement Bias comes from asking
questions that do not produce a true answer.
 Confusing
7- 7
Some Ways to Avoid Bias
What percentage of people who were asked
to participate actually did so?
 Did the researchers choose people to
participate in the survey or did the people
themselves choose to participate?
 Did the researcher leave out whole segments
of the population who are likely to answer the
question differently from the rest of the
population?

7- 8
Identify the Possible Biases.
Population: All Americans



7- 9
A student asked all 2500 of her Facebook friends if
A researcher asked 500 randomly selected people,
“Are you in favor of the unfair tax burden that the
hard working successful business people have so
that the lazy unemployed can receive a paycheck
without working?”
On July 4, CNN posted on their website a question
asking if they supported the current US military
operations. 18,943 people responded.
Identify the Possible Biases.
Population: All Americans



7 - 10
100 randomly selected Americans were asked by a
researcher, “Do you currently have a sexually
transmitted disease?”
Gallop randomly selected 1000 phone numbers
from the yellow pages and then called to ask if they
supported government funding of high speed rail.
A researcher stood outside a grocery store and
asked 250 shoppers, “Do you eat out at a restaurant
at least three times per week?”
Simple Random Sampling
Simple Random Sampling, SRS, involves
randomly drawing people from the
population without replacement.
 If a scientific sampling technique is not done,
we cannot learn anything about the
population by looking at the sample data.

7 - 11
StatCrunch and SRS


Use StatCrunch to select a
simple random sample,
n = 3, of:
Miguel, Jen, Emily, Joe, and Anna.
Number the names:
Miguel:1, Jen:2, Emily:3, Joe:4, Anna:5


StatCrunch: Data → Sample Columns
Convert the numbers back to names:

7 - 12
4,2,1→Anna, Jen, Miguel
7.2
Measuring the Quality
of a Survey
Accuracy and Precision
a.
 b.
 c.
 d.

7 - 14
Both accurate and precise.
Precise but not accurate.
Accurate but not precise.
Neither accurate nor precise.
Accuracy and Precision, Bias and
Standard Error

Bias is a measure of the accuracy.
 If
only basketball players are measured to
estimated the proportion of Americans who are
taller than 6 feet, then there is a bias for a larger
proportion.

Standard Error is a measure of precision.
 If
the sample size is only three, the estimate of
the proportion of tall people using the sample is
likely to be far from the proportion of tall people
in the US. The standard error will be large.
7 - 15
Simulation: Small Sample Size

Present: Mike, Rick, Sue, Mary, Rose
 Percent

Male: p = 40%
Take random samples of n = 3 people and
record the percent male:
 Rick,
Sue, Rose:
 Mike, Rick, Rose:
 Sue, Mary Rose:

7 - 16
pˆ  33%
pˆ  67%
pˆ  0%
One population proportion, many sample
proportions
Distribution of the Sample Proportions
This distribution has
mean 40%.
 This distribution has
a standard deviation
 Notice the mean of
the collection of all the sample proportions
equals the population proportion of 40%.

7 - 17
Larger Sample Size


Consider a population of 1,000,000 people, 40%
male. Randomly select n = 300 of them.
If we look at all possible samples with
n = 300 then the distribution of the sample
proportions would have




7 - 18
Mean = 40%
Standard Error ≈ 2.8%
Notice that the mean sample proportion equals the
population proportion.
The standard error is 10 times smaller when the
sample size is 100 times larger.
Sample Sizes, Mean and Standard Error
The mean of all sample proportions always
equals the population proportion.
 The standard error will be smaller for larger
sample sizes.
 The size of the population has no effect on
the distribution of all sample proportions as
long as the population size is at least 10 times
larger than the sample size.

7 - 19
Bias, Precision, Mean, and Standard
Error

For a SRS, the bias is 0.


For a SRS, the precision is better for larger sample
sizes.


7 - 20
Equivalent to the statement that the mean of all the
sample proportions equals the population proportion.
Equivalent to the statement that the standard error is
smaller for larger sample sizes.
The precision and bias are independent of the
population size as long as the population size is as
least 10 times larger than the sample size.
Formulae for the Mean and the Standard
Error
p (1  p )
 pˆ  p ,
 pˆ 
n


7 - 21
The mean of the sampling distribution is equal to
the population proportion.
If the sample size is increased by a factor then the
standard error will be decreased by the square root
of that factor.
Example
 pˆ  p ,


pˆ
n
Only 65% of insured women get annual
Pap tests. Find the mean and standard error
for the sampling distribution with sample
size 500.
 Mean:
 Standard
7 - 22

p (1  p )
 pˆ  p  65%
Error: 
pˆ

0.65 1  0.65 
500
 2.1%
The Trouble With p




The main application of sampling distributions is to
address the bias and precision of a sample that is
taken.
The purpose of taking a sample is to estimate the
population proportion: p is unknown.
The standard error S E    p (1  p ) cannot be
n
calculated.
Use pˆ as a estimate for p. This gives an estimate for
the standard error: S E  pˆ (1  pˆ )
pˆ
est
7 - 23
n
7.3
The Central Limit
Theorem for Sample
Proportions
Probability Distributions for Sample
Proportions
7 - 25
Requirements for the Central Limit
Theorem for Sample Proportions
Random and Independent: The sample is
collected randomly and the trials are
independent of each other.
 Large Sample: The sample has at least 10
successes, np ≥ 10, and at least 10 failures
n(1 – p) ≥ 10.
 Large Population: If the sample is collected
without replacement, then the population size
is at least 10 times the sample size.

7 - 26
The Central Limit Theorem for Sample
Proportions


7 - 27
The Central Limit Theorem for Sample Proportions:
If the trials are random and independent and the
sample and population sizes are large then the
sampling distribution of pˆ is approximately normal
and follows

p 1  p  

N  p,


n


If you don’t know p, pˆ can be substituted to find the
standard error.
The Central Limit Theorem

200 randomly selected American drivers were asked
if they text while driving. 48 of them admitted that
the did.






7 - 28
The drivers were randomly selected.
Successes: 48 ≥ 10, Failures: 152 ≥ 10
Population Size (# American Drivers) is very large.
Conclusion: The Distribution is approximately
normal.
Mean = 48/200 = 0.24
SE est 
0.24  0.76
 0.03
200
Notes About the Requirements
Since random sampling is usually impossible
to do, other sampling techniques are often
 A large sample size is absolutely necessary.
 Typically the population of interest is very
large, but one should still be aware of this
requirement.

7 - 29
Finding Probabilities with the Central
Limit Theorem


78% of all laboratory mice can make it through a
maze. If 600 randomly selected mice attempt the
maze, what is the probability that more than 80% of
them will make it through the maze?
Note that all requirements are met.




7 - 30
Random Sample
# Successes (np) = 600 x 0.78 = 468 ≥10
# Failures (n(1-p)) = 600 x 0.22 = 132 ≥10
Large population size: All mice in existence.
78% of all laboratory mice can make it through a
maze. If 600 randomly selected mice attempt the
maze, what is the probability that more than 80% of
them will make it through the maze?

By CLT the distribution for all possible sample
proportions, the sampling distribution, is
approximately Normal.


Mean = .78
SE 
.78  .22
 0.017
600

7 - 31
Sampling Distribution: N (0.78, 0.017 )
78% of all laboratory mice can make it through a
maze. If 600 randomly selected mice attempt the
maze, what is the probability that more than 80% of
them will make it through the maze?

Sampling Distribution: N (0.78, 0.017 )
P ( pˆ  0.8)  0.12
7 - 32
Failure of the CLT
About half a percent of all people in the
world are living with HIV. You want to find
the probability that out of 1000 randomly
selected people, at least 1% of them are
living with HIV.
 np = 1000 x 0.005 = 5 < 10
 The CLT does not apply.
 Do not use the Normal Distribution to
calculate this probability.

7 - 33
7.4
Estimating the
Population Proportion
with Confidence
Intervals
Confidence Intervals



A Confidence Interval for a Population Proportion
is an interval where the unknown population
proportion is likely to lie.
Example: Suppose the CLT applies and one wants
to estimate p. Let pˆ  0.24 and SE  0. 03 .
Estimate 1.96 SE from the mean:



7 - 35
0.24 – 1.96 x 0.03 = 0.19
0.24 + 1.96 x 0.03 = 0.30
We can be 95% confident that the population
proportion is between 0.19 and 0.30.
Confidence Interval Interpretation


7 - 36
The proportion of green M&M’s is 0.16. You take
several samples of 80 M&M’s each and come up
with the following 95% confidence intervals:
 (.14,.18), (.12,.17), (.15,.19), (.11,.15), (.12,.17),
(.15,.20), (.13,.17), (.14,.19), (.13,.18), (.15,.19),
(.15,.20)
All of the above confidence intervals except
(.11,.15) successfully contain the population
proportion.
Confidence Interval Interpretation

7 - 37
For every random sample that can be taken
from a population there corresponds a 95%
confidence interval. 95% of these confidence
intervals will successfully contain the
population proportion and 5% will not.
Computing Confidence Intervals


Of 500 random people surveyed, 72 were smokers.
Find the 95% confidence interval.
pˆ 
72
500


7 - 38
 0.144
SE est 
.144(1  .144)
 0.0157
500
Margin of Error: 1 .9 6  0 .0 1 5 7  0 .0 3 1
 0.144 – 0.031 = 0.113
 0.144 + 0.031 = 0.175
We are 95% confident that between 11.3% and
17.5% of all people are smokers.
Confidence vs. Margin of Error
Increasing the level
of confidence
increases the margin
of error.
 Decreasing the level of confidence decreases
the margin of error.

7 - 39
Computing Confidence Intervals


176 of 200 patients randomly selected to receive
treatment survived. Find the 90% confidence interval.
pˆ 
176
200


7 - 40
 0.88
S E est 
.8 8(1  .8 8)
 0 .0 2 3
200
Margin of Error: 1 .6 4 5  0 .0 2 3  0 .0 3 8
 0.88 – 0.038 = 0.842
 0.88 + 0.038 = 0.918
We are 90% confident that between 84.2% and 91.8%
of all people who receive the treatment survive.
Interpreting Confidence Intervals
300 randomly chosen voters were asked if
they favored the bond initiative to fund a new
college sports arena. 120 did support it. The
95% confidence interval is: (0.34,0.46).
 Since a bond initiative requires over 50% of
the votes to pass and the 0.50 is above the
confidence interval, it is unlikely that the
bond initiative will pass.

7 - 41
StatCrunch and Confidence Intervals


7 - 42
395 of the 600 randomly surveyed students
purchased an e-text. Find a 99%
confidence interval.
Stat → Proportions → One sample → with
Summary
Confidence Intervals Summary
Use a confidence interval to get plausible
bounds on a population proportion.
 Do not use if npˆ  10 or n (1  pˆ )  10
 The confidence level of 95% is standard.
 A lower level, e.g. 90%, can be used if you need
a smaller margin of error.
 A higher level, e.g. 99%, can be used at the
expense of a higher margin of error.

7 - 43
Chapter 7
Case Study
The Study

In 2006, the AMA gave the press release:
 “Sex
and intoxication among women more
common on spring break according to AMA
poll”
 “Eighty-three percent of the [female, collegeattending] respondents agreed spring break trips
involve more or heavier drinking than occurs on
college campuses and 74 percent said spring
break trips result in increased sexual activity.”
7 - 45
The Design
Posted a survey on the AMA website.
 644 women chose to respond.
 Cited a margin of error of ±4%.
 Cited a 95% confidence interval.

7 - 46
The Issues
Not a scientific study.
 Voluntary Response Bias
 Should never cite a margin of error and
confidence interval from a biased study.

7 - 47
The Conclusion
AMA changed its posting to state that the
results were not based on a random sample.
 They removed all remarks about the margin
of error from the paper.

7 - 48
Chapter 7
Guided Exercise
The Oregon Bar Exam
According to the Oregon Bar Association,
approximately 65% of the people who take
the bar exam to practice law in Oregon pass
the exam.
 Find the approximate probability that at least
67% of 200 randomly sampled people who
take the Oregon bar exam will pass it.

7 - 50
The Oregon Bar Exam: Population
Proportion
According to the Oregon Bar Association,
approximately 65% of the people who take
the bar exam to practice law in Oregon pass
the exam.
 The sample proportion is 0.67. What is the
population proportion?

 0.65
7 - 51
The Oregon Bar Exam: Checking
Assumptions

Randomly Sampled?
 Yes

Large Enough Sample Size?
 np
= 200(.65) = 130. n(1-p) = 200(.35) = 70
Both are greater than 5, so Yes.

Population large enough?
 Since
more than 10 x 200 = 2000 people take the
Oregon bar exam the population is large enough.
7 - 52
Calculate the Standard Error
SE 
p (1  p )
n

0 .6 5(1  0 .6 5)
200
 0 .0 3 4
7 - 53
Calculate the z-Score
z
0.67  0.65
0.034
 0 .5 9
7 - 54
The Normal Curve

7 - 55
P(z > 0.59) is the area to the right on the
normal curve.