Review #2

Review #2
Chapter 9
Chapter 10
Chapter 11 and 12
1
Chapter 9
Sampling Distributions
• A statistic is a random variable describing a
characteristic of a random samples.
– Sample mean
– Sample variance
• We use statistic values in inferential
statistics (make inference about population
characteristics from sample characteristics).
• Statistics have distributions of their own.
2
Chapter 9
The Central Limit Theorem
• The distribution of the sample mean is normal if the parent
distribution is normal.
• The distribution of the sample mean approaches the normal
distribution for sufficiently large samples
(n 30), even if the parent distribution is not normal.
• The parameters of the sample distribution of the mean are:
– Mean:
– Standard deviation:
(Assumption:
The population is sufficiently
large. No correction is needed
in the calculation of the
variance).
x  x
x 
x
n
3
Chapter 9
The Central Limit Theorem
• Problem 1 (Using Excel)
Given a normal population whose mean is 50 and
whose standard deviation is 5,
– Question 1: Find the probability that a random
sample of 4 has a mean between 49 and 52
P( 49  x  52 )  P(
49  50
5
Z
52  50
4
5
)
4
P( .4  Z  .8 )  [ In Ex celworksheet type :
 NORMSDIST(.8) - NORMSDIST(-.4) ]
The answer : .443566
-.4
.8
4
Normal table
Chapter 9
The Central Limit Theorem
• Problem 1 (Using the table)
Given a normal population whose mean is
50 and whose standard deviation is 5,
– Question 1: Find the probability that a random
sample of 4 has a mean between 49 and 52
49  50
52  50
P(49  x  52)  P(
Z
)
5 4
5 4
P(.4  Z  .8)  .7881 .3446  .4435
-.4
.8
5
Normal table
Chapter 9
The Central Limit Theorem
• Problem 1
– Question 2: Find the probability that a random
sample of 16 has a mean between 49 and 52.
49  50
52  50
P(49  x  52)  P(
Z
)
5 16
5 16
P(.8  Z  1.6)  .9332  .2119  .7213
6
Normal table
Chapter 9
The Central Limit Theorem
• Problem 2: The amount of time per day spent by
adults watching TV is normally distributed with
=6 and =1.5 hours.
– Question 1: What is the
probability that a randomly
selected adult watches TV
for more than 7 hours a day?
P( X  7)  [In Ex celtype :
 1 - NORMDIST(7,6,1.5, True)
then clickanywhere.
The answer : .252492
– Question 2: What is the
probability that 5 adults watch
TV on the average 7 or more
hours?

76 

P(X  7)  P Z 
1.5 5 

7
P(Z  1.49)  1  .9319  .0681
Normal table
Chapter 9
The Central Limit Theorem
• Problem 2:
– Question 3: What is the probability that the total time of
watching TV of the five adults will not exceed 28 hours?

5.6  6 

P(X  28/5)  P Z 


1.5
5


– Question 4: What total TV watching time is exceeded by
only 3% of the population for samples of 5 adults?
P(Total time  x 0 )  P(Average time  x 0 )  .03
1.Excel returns X for a
[In Ex celtype :  NORMINV(.9 7,6, .670822)
given left hand tail probability
then clickanywhere. The answer : x 0  6.892137
2. .670822 = 1.5/5.5
Thus,x 0  5(6.892137)  34.46
8
Normal table
Chapter 9
The Central Limit Theorem
• Problem 3:
Assume that the monthly rents paid by students in a
particular town is \$350 with a standard deviation of \$40. A
random sample of 100 students who rented apartments was
taken.
Question1: What is the probability that the sample mean
of the monthly rent exceeds \$355?

355  350 

P(X  355)  P Z 
 P(Z  1.25)


40 100 

P(Z  1.25)  1  .8944  .1056
9
Normal table
Chapter 9
The Central Limit Theorem
• Problem 3 - continued
Question2: What is the probability that the total revenue
from renting 10 randomly selected apartments falls
between 3300 and 3700 dollars?
P(3300  Total rental revenue  3700) 
40/10.5 = 12.64911
P(330  Average rent  370) 
[In Ex celtype :
 NORMDIST(370,350,12.64911) - NORMDIST(330,350,12.64911)
The answer : 0.886154
10
Normal table
Chapter 9
The Central Limit Theorem
• Problem 3 - continued
Question3: Let’s assume the population mean was
unknown, but the standard deviation was known to be \$40.
A sample of 100 rentals was selected in order to estimate
the mean monthly rent paid by the whole student
population. What is the probability that the sample mean
differ from the actual mean by more than \$5? How about
more than \$10?
11
Chapter 9
The Central Limit Theorem
• Problem 3
– continued
(i ) P(X  μ  5 or X  μ  5 ) 
P(X  μ  5)  P(X  μ  5) 
 X μ

 X μ
5
5 




P

 P


40 100 
40 100 
 σx
 σx
P(Z  1.25)  P(Z  1.25)  .1056  .1056  .2112
(ii) P(X  μ  10 or X  μ  10 ) 
P(X  μ  10)  P(X  μ  10) 
 X μ

 X μ
10
 10 




P

 P


40 100 
40 100 
 σx
 σx
P(Z  2.5)  P(Z  2.5)  2(1 .9938)  .0124
12
Chapter 9
Sampling distribution of the sample
proportion
In a sample of size n, if np > 5 and n(1-p) > 5, then the
sample proportion ^p = x/n is approximately normally
distributed with the following parameters:
p(1  p)
 pˆ  p and  pˆ 
, therefore ,
n
(Assumption:
pˆ  p
The population is sufficiently
Z
large. No correction is needed
p(1  p) n
in the calculation of the
variance).
13
Sampling distribution of the sample
proportion
• Problem 4:
– A commercial of a household appliances
manufacturer claims that less than 5% of all of
its products require a service call in the first
year.
– A survey of 400 households that recently
purchased the manufacturer products was
conducted to check the claim.
14
Normal table
Sampling distribution of the sample
proportion
Problem 4 - Continued:
Assuming the manufacturer is right, what is the
probability that more than 10% of the surveyed
households require a service call within the first
year?


.
10

.
05
  P(Z  4.59)  0
P(pˆ  .10)  P Z 


.
05
(
1

.
05
)
400


If indeed 10% of the sampled households reported
a call for service within the first year, what does it
tell you about the the manufacturer claim?
15
Sampling Distribution of the
Difference Between two Means
• If two independent variables are normally
distributed with means and variances
1,21, and 2,22 respectively, then x1 – x2
is also normally distributed with:
 x1  x 2   1   2

2
x1  x 2
2
1
2
2




n1 n 2
16
Sampling Distribution of the
Difference Between two Means
• When at least one of the populations is not
normally distributed but the samples sizes
are both at least 30, x1 – x2 is approximately
normally distributed, with a mean and a
variance as indicated above.
17
Sampling Distribution of the
Difference Between two Means
• Example: A national TV telethon committee is interested
in determining whether donations made by males are on
the average larger than those made by females by \$4. Two
samples of 25 males and 25 females were selected, and the
donations made recorded. If the standard deviations of the
male and female populations are \$2.4 and \$1.8
respectively, what is the probability that sample mean of
the male donations exceeds the sample mean of the female
donations by at least \$5? Assume donations for the two
populations are normally distributed.
18
Sampling Distribution of the
Difference Between two Means
• Solution


 x 1  x 2  ( 1   2 )
P( x1  x 2  5)  P

2
2

1  2


For males For females
n1 n2




54

2
2
2.4 1.8 

25
25 
19
Chapter 10
Introduction to Estimation
• A population’s parameter can be estimated
by a point estimator and by an interval
estimator.
• A confidence interval with 1-a confidence
level is an interval estimator that covers the
estimated parameters (1-a)% of the time.
• Confidence intervals are constructed using
sampling distributions.
20
Confidence interval of the mean –
Known Variance
• We use the central limit theorem to build
the following confidence interval
x  za / 2
a/2

n
   x  za / 2
n
a/2
1-a
-za/2

za/2
21
Confidence interval of the mean –
Known Variance
• Problem 5: How many classes university
students miss each semester? A survey of
100 students was conducted.
(See Data next)
• Assuming the standard deviation of the
number of classes missed is 2.2, estimate
the mean number of classes missed per
student. Use 99% confidence level.
22
Data
Confidence interval of the mean –
Known Variance
– Solution
x  za / 2

n
= 10.21 2.575
2.2
100
= 10.21 .57
1- a = .99
a = .01
a/2 = .005
Za/2 = Z.005= 2.575
LCL = 9.64, UCL = 10.78
You can used Data Analysis Plus > Z-Estimate: Mean
23
Data
Confidence interval of the mean –
Known Variance
– Solution (using Data Analysis Plus):
• Shade the data set (you may include the title label)
• Select Data Analysis Plus, then “Z-Estimate: Mean”
• Type in the sigma (2.2), check Labels (if
appropriate), type in alpha (.01), click OK.
z-Estimate: Mean
Mean
Standard Deviation
Observations
SIGMA
LCL
UCL
Classes
10.21
2.1756
100
2.2
9.643316
10.77668
24
Selecting the sample size
• The shorter the confidence interval, the
more accurate the estimate.
• We can, therefore, limit the width of the
interval to 2W, and get
x  W  x  za / 2

n
or W  z a / 2
• From here we have
 za / 2 
n

W


2

n
W is called “Margin of error”, or
“Bound on the error estimate”
25
Selecting the sample size
• Problem 6
An operation manager wants to estimate the
average amount of time needed by a worker to
assemble a new electronic component.
• Sigma is known to be 6 minutes.
• The required estimate accuracy is within 20
seconds.
• The confidence level is 90%; 95%.
• Find the sample size.
26
Selecting the sample size
– Solution
 = 6 min; W = 20 sec = 1/3 min;
• 1 - a =.90 Za/2 = Z.05 = 1.645
2
2
2
 za / 2 
 z .05  
 1.645(6) 
n


 876.75



 1/ 3 
 W 
 W 
Take n  877
• 1-a = .95, Za/2 = Z.025 = 1.96
2
 1.96(6) 
n
 1244.67 Take n  1245

 1/ 3 
27
Chapter 11
Hypotheses tests
– In hypothesis tests we hypothesize on a value of
a population parameter, and test to see if there
is sufficient evidence to support our belief.
– The structure of hypotheses test
• Formulate two hypotheses.
– H0: The one we try to reject in favor of …
– H1: The alternative hypothesis, the one we try to prove.
• Define a significance level a.
28
Hypotheses tests
– The significance level is the probability of
erroneously reject the null hypothesis.
a= P(reject H0 when H0 is true)
– Sample from the population and calculate a
statistic that provides an indication whether or
not the parameter value under H1 is more likely
to be true.
– We shall test the population mean assuming the
standard deviation is known.
29
Hypotheses tests of the Mean –
Known Variance
• Problem 7:
A machine is set so that the average
diameter of ball bearings it produces is .50
inch. In a sample of 100 ball bearings the
mean diameter was .51 inch. Assuming the
standard deviation is .05 inch, can we
conclude at 5% significance level that the
mean diameter is not .50 inch.
30
Hypotheses tests of the Mean –
Known Variance
• Solution:
The population studied is the ball-bearing
diameters.
– We hypothesize on the population mean.
– A good point estimator for the population mean
is the sample mean.
– We use the distribution of the sample mean to
build a sample statistic to test whether
 = .50 inch.
31
Hypotheses tests of the Mean –
Known Variance
Solution – (A Two Tail rejection region)
– Define the hypotheses:
• H0:  = .50
• H1:  = .50
The probability of
conducting a
type one error
P( X  X L1 or X  X L 2 given that   .50)  .05, or
P( Z   Z L1 or Z  Z L 2 given that   .50)  .05
If X L1 and XL2 have symmetrical values around μ
the ZL1 and ZL2 are symmetrical around zero,therefore,
Z L1   Z α/2 and ZL2  Z α/2 .
32
Hypotheses tests of the Mean –
Known Variance
Solution - A Two Tail rejection region
Critical Z
P(Z  Z.025 or Z  Z.025 given that   .50)  .05
Z.025 = 1.96 (obtained from the Z-table)
Build a rejection region: Zsample> Za/2, or
Zsample<-Za/2
-1.96
1.96
Calculate the value of the sample Z statistic
and compare it to the critical value
Z sample 
X 

n

.51 .50
.05 100
2
Since 2 > 1.96, there is
sufficient evidence to reject
H0 in favor of H1 at 5%
significance level. 33
Hypotheses tests of the Mean –
Known Variance
Solution - A Two Tail rejection region
• We can perform the test in terms of the mean value.
• Let us find the critical mean values for rejection
XL2=0 + Z.025
XL1=0 - Z.025

=.50+1.96(.05)/(100)1/2=.5098
n
 =.50 -1.96(.05)/(100)1/2=.402
n
Since.51 > .5098, there is sufficient evidence to
reject the null hypothesis at 5% significance level.
34
Hypotheses tests of the Mean –
Known Variance
• Calculate the p value of this test
• Solution
p-value = P(Z > Zsample) + P(Z < -Zsample) =
P(Z > 2) + P(Z < -2) = 2P(Z > 2) =
2[1 - .9772} = .0456
• Since .0456 < .05, H0 is rejected.
35
Hypotheses tests of the Mean –
Known Variance
• Problem 8
– The average annual return on investment for American
banks was found to be 10.2% with standard deviation
of 0.8%.
– It is believed that banks that exercise comprehensive
planning do better.
– A sample of 26 banks that exercise comprehensive
training provide the following result: Mean return =
10.5%
– Can we infer that the belief about bank performance is
supported at 10% significance level by this sample
result?
36
Data
Hypotheses tests of the Mean –
Known Variance
• Solution: (A right Hand Tail Rejection region)
The population tested is the “annual rate of
return”.
– H0:  = 10.2
– H1:  > 10.2
• Let us perform the test with the standardized
rejection region approach:
Zsample > Z.10 (Right hand tail rejection region)
Z.10 = 1.28. Reject H0 if Zsample > 1.28
37
Hypotheses tests of the Mean –
Known Variance
Z sample 
x 

n

10.5  10.2
.8
 1.91
26
• Conclusion
– At 10% significance level there is sufficient evidence in
the data to reject H0 in favor of H1, since the sample
statistic falls inside the rejection region.
• Interpretation:
– If we are willing to accept 10% chance of making the
wrong conclusion, we can conclude banks conducting
comprehensive training perform better than banks who
do not.
38
Data
Hypotheses tests of the Mean –
Known Variance
• Let us perform the test with the p-value
method:
P(X > 10.5 given that  = 10.2) =
P(Z > (10.5 – 10.2)/[.8/(26)1/2] =
P(Z > 1.91) = .5 - .4719 = .0281
• Since .0281 < .10 we reject the null
hypothesis at 10% significance level.
39
Hypotheses tests of the Mean –
Known Variance
• Note the equivalence between the
standardized method or the rejection region
method and the p-value method.
P(Z>Z.10) = .10
Z10 = 1.28
.10
The statement “p-value is smaller
than alpha, is equivalent to the
statement “ the test statistic falls
in the rejection region”
.0281
1.28 1.91
40
Hypotheses tests of the Mean –
Known Variance
• Problem 9
– In the midst of labor-management negotiations, the president
of a company argues that the company’s blue collar workers,
who are paid an average of \$30K a year, are well-paid
because the mean annual pay for blue-collar workers in the
country is less than \$30K.
– This figure is disputed by the union. To test the president’s
belief an arbitrator draws a random sample of 350 bluecollar workers from across the country and their income
recorded (see file Salaries).
– If the arbitrator assumes that income is normally distributed
with a standard deviation of \$8,000, can it be inferred at 5%
significance level that the company’s president is correct?
41
Data
Hypotheses tests of the Mean –
Known Variance
• Solution (A left Hand Tail Rejection Region)
The population tested is the ann. Salary
– H0:  = 30K
H1:  < 30K
– Left hand Tail Rejection region: Z < -Z.05 or Z < -1.645
ZSample =(29,119.5-30,000)/(8,000/350.5)= -2.059
Since –2.059 < -1.645 there is sufficient evidence to
infer that on the average blue collar workers’ income is
lower than \$30K at 5% significance level.
42
Hypotheses tests of the Mean –
Known Variance
• Calculate the p-value of this test:
• Solution Z-Test: Mean
Incomes
p-value =Mean
P(Z < Zsample) = P(Z
< -2.059)
29119.52
Standard Deviation
Observations
Hypothesized Mean
SIGMA
z Stat
P(Z<=z) one-tail
z Critical one-tail
P(Z<=z) two-tail
z Critical two-tail
8460.491
350
30000
8000
-2.059
0.0197
1.6449
0.0394
1.96
43
Type II Error
• Problem 7a
Calculate b for the two-tail hypotheses test performed in
problem 7, when the actual mean diameter is .515 inch.
• Solution
– The rejection region in terms of the critical values of the sample
mean was found before: XL1 = .402; XL2 = .5098.
H0:  = .500
H1:  = .515
b = P(Do not reject H0 when H1 is true) =
P(.402 < x < .5098 when  = .515) =
P(.402-.515)/[.05/(100).5] < Z < (.5098-.515)/[.05/(100).5]
P(-22.6 < Z < -1.04) = P(1.04 < Z < 22.6) =
P(Z<22.6) – P(Z<1.04) ≈ 1-P(Z<1.04) = 1 - .8508 = .1492
– This large probability may be reduced by taking larger samples
44
Ch 12: Inference when the
Variance is Unknown
• Generally, the variance may be unknown
• In this case we change the test statistic from
“Z” to “t”, when testing the population
mean.
• To test the population proportion we’ll use
the normal distribution (under certain
conditions).
45
Testing the mean –
unknown variance
• Replace the statistic Z with “t”
t
X 
s
n
The original distribution must be normal (or at
least mound shaped).
46
Testing the mean –
unknown variance
• Problem 10
– A federal agency inspects packages to determine if the
contents is at least as large as that advertised.
– A random sample of (i)5, (ii)50 containers whose
packaging states that the weight was 8.04 ounces was
drawn. (data is provided later)
– From the sample results…
• Can we conclude that the average weight does not meet the
weight stated? (use a = .05).
• Estimate the mean weight of all containers with 99%
confidence
• What assumption must be met?
47
Testing the mean –
unknown variance
• Solution
– We hypothesize on the mean weight.
• H0:  = 8.04
• H1:  < 8.04
• (i) n=5. For small samples let us solve manually
Assume the sample was: 8.07, 8.03, 7.99, 7.95, 7.94
– The rejection region: t < -ta,n1 = -t.05,5-1 = -2.132
The tsample = ?
– Mean = (8.07+…+7.94)/5 = 7.996
-2.132
Std. Dev.={[(8.07-7.996)2+…+(7.94- 7.996)2]/4}1/2 = 0.054
48
Testing the mean –
unknown variance
– The tsample is calculated as follows:
t
X 
s
n

7.996  8.04
0.054
 1.32
5
– Since -1.32 > -2.132 the sample statistic does
not fall in the rejection region. There is
insufficient evidence to conclude that the mean
weight is smaller than 8, at 5% significance
level.
-2.132 -.165
49
Testing the mean –
unknown variance
– (ii) n=50. To calculate the sample statistics we use
Excel, “Descriptive statistics” from the Tools>Data
analysis menu. From the sample we obtain:
Mean = 8.02; Std. Dev. = .04
– The confidence interval is calculated by
x  ta/2
1-a = .99
a = .01
a/2 = .005
s
n
= 8.02 2.678
.04
50
= 8.02  .015
LCL = 8.005,
UCL = 8.35
t.005,50-1 = about 2.678 from the t - table
50
Data
Testing the mean –
unknown variance
– Check whether it appears that the distribution is
normal
Frequency
20
15
10
5
0
7.93
7.97
8.01
8.05
8.09
More
51
Data
Using Excel:
– To obtain an exact value for t use the TINV
function:
=TINV(0.01,49)
The exact value: 2.6799535
Degrees of
freedom
.01 is the two tail probability
= .005*2
52
Testing the mean –
unknown variance
• Problem 11
– Engineers in charge of the production of car seats are
concerned about the compliance of the springs used
with design specifications.
– Springs are designed to be 500mm long.
• Springs too long or too short must be reworked.
• A standard deviation of 2mm in springs length will result in an
acceptable number of reworked springs.
– A sample of 100 springs was taken and measured.
53
Data
Testing the mean –
unknown variance
• Problem – continued
– Can we infer at 10% significance level that the mean
spring length is not 500mm?
Solution
H0:   500
H1:  500
Since the standard deviation is unknown
We need to run a t-test, assuming the
spring length is normally distributed.
Rejection region:
t-Test of a Mean
Sample mean
499.9697 t Stat
t < -ta/2 or t > ta/2
Sample standard deviation
2.55247 P(T<=t) one-tail
Sample
size
100
t Critical one-tail
with d.f. = 99
Hypothesized mean
500
P(T<=t) two-tail
Alpha
t < -1.6604 or
t > +1.6604
0.1
t Critical two-tail
-.12
-1.6604
-0.12
0.4529
1.2902
0.9057
1.6604
54
-1.6604
Inference about a population
proportion
• The test and the confidence interval are based on
the approximated normal distribution of the
sample proportion, if np>5 and n(1-p)>5.
• For the confidence interval of p we have:
pˆ  Z a 2
where p^ = x/n
pˆ (1  pˆ )
n
• For the hypotheses test, we use a Z test.
55
Inference about a population
proportion
• Problem 12 (problem 11 continued). The
engineers were interested in the percentage of
springs that are the correct length. They marked
each spring in the sample as
– Correct – 1;
– Too long – 2;
– Too short – 3;
Can we infer that less than
90% of the springs are the
correct length, at 10% sig.
level?
56
Data
Inference about a population
proportion
• Problem 12 - Solution Conclusion:
Since –1.33 < -1.28 we can infer
– H0: p = .9
that less than 90% of the springs
H1: p < .9
do not need reworking.
– Rejection region:
Z < -Za,or Z < -1.28
Z
pˆ  p
pˆ (1  pˆ ) n

.86  .8
.86(1  .86) 100
 1.33
z-Test of a Proportion
Sample proportion
Sample size
Hypothesized proportion
Alpha
0.86
100
0.9
0.1
z Stat
P(Z<=z) one-tail
z Critical one-tail
P(Z<=z) two-tail
z Critical two-tail
-1.33
0.0912
1.2816
0.1824
1.6449
57
Data
Inference about a population
proportion
• Problem 12 – solution continued
– Let us estimate the proportion of good springs
at 99% confidence level.
pˆ  Z a 2
pˆ (1  pˆ )
.86(1  .86)
 .86  2.575
n
100
z-Estimate of a Proportion
Sample proportion
Sample size
Confidence level
0.86
100
0.99
Confidence Interval Estimate
0.86
Lower confidence limit
Upper confidence limit

0.0894
0.7706
0.9494
58
Inference about a population
proportion
• Problem 12 – solution continued
– Find the sample size if the proportion of good
springs is to be estimated to within .035.
Consider the given sample an initial sample.
2
 z a 2 pˆ (1  pˆ ) 
 2.575 .86(1  .86) 
 
n
  652
W
.035




2
59
Inference about a population
proportion
• Problem 13
– A consumer protection group runs a survey of
400 dentists to check a claim that more than 4
out of 5 dentists recommend ingredients
included in a certain toothpaste.
– The survey results are as follows:
71 – No; 329 – Yes
– At 5% significance level, can the consumer
group infer that the claim is true?
60
Inference about a population
proportion
• Problem 13 - Solution
– The two hypotheses are:
H0: p = .8
The rejection region: Z > Za
H1: p > .8
Z
pˆ  p
p(1  p) n

.8225 .8
.8(1  .8) 400
 1.125
Z.05 = 1.645
Conclusion: Since 1.125 < 1.645 the consumer group
cannot confirm the claim at 5% significance level.
61
Summary Example
• An automotive expert claims that the large number
of self-serve gas stations has resulted in poor
automobile maintenance, and that the average tire
pressure is more than 4.5 psi below it’s
manufacturer specifications.
• A random sample of 50 tires revealed the results
stored in the file TirePressure.
• Assume the tire pressure is normally distributed
with  = 1.5 psi, and answer the following
questions:
62
Tire Pressure
Summary Example
• At 10% significance level can we infer that the expert is
correct? What is the p value?
Solution
The p value =
– The Hypotheses:
H0:  = 4.5
H1:  > 4.5
P(Sample Mean > 5.04 when  = 4.5)=
P(Z > 2.545) = 1- .9945 = .0055
The rejection region: Z > Z.10or Z > 1.28.
From the data we have: mean = 5.04, so
Z=(5.04 – 4.5)/(1.5/50.5) = 2.545
– Since 2.545 > 1.28, there is sufficient evidence to infer that the
expert is correct.
63
Summary Example
•
Find the probability of making a type II error when
the actual tire under-inflation is 5 psi on the average.
Solution
The Rejection Region in terms of the sample means is found first:
ZL= 1.28 =(XL – 4.5)/(1.5/50.5).
XL= 4.5 + 1.28(1.5/50.5) = 4.77.
So, the Rejection Region is: Sample mean > 4.77.
b = P(accept H0 when H1 is true) =
P(sample mean does not fall in the RR, when  = 5) =
P( x < 4.77 when  = 5) =
P(Z < (4.77-5)/(1.5/50.5)) = P(Z < -1.08) =
From Excel: [=NORMSDIST(-1.077)] = .1407
64
Inference about the population
Variance
• The following statistic is c2 (Chi squared)
distributed with n-1 degrees of freedom:
(n  1)s
c 
2

2
2
• We use this relationship to test and estimate
the variance.
65
Inference about the population
Variance
• The Hypotheses tested are:
H0 :  2   20
H1 :  2   20 or   20 or   20
• The rejection region is:
(n  1)s 2
 02
 c a2 , n 1 or  c12 a , n 1
For the two tail test replace
a
with a.
2
66
Testing the Variance
• Problem 15
• Engineers in charge of the production of car seats are
concerned about the compliance of the springs used with
design specifications.
• Springs are designed to be 500mm long.
– Springs too long or too short must be reworked.
– A standard deviation of 2mm in springs length will
result in an acceptable number of reworked springs.
• A sample of 100 springs was taken and measured.
67
Data
Testing the Variance
• Problem 15 - continued
Can we infer at 10% significance level that the
number of springs requiring reworking is
unacceptably large?
H0: 2 = 4
H1: 2 > 4
The number of springs requiring reworking
depends on the standard deviation, or the
variance.
Rejection region: Chi-squared Test of a Variance
c2Sample > c2a
Sample variance
6.515104 Chi-squared Stat
161.25
Sample size
100
P(CHI<=chi) one-tail
0.0001
d.f. = 99
Hypothesized variance
4
chi-squared Critical one-tail 117.4069
Alpha
c2Sample > 117.4069
0.1
P(CHI<=chi) two-tail
chi-squared Critical two-tail
0.0002
77.0463
123.2252
68
Testing the Variance
• Problem 15 - conclusion
Since 161.25 > 117.4069, we can infer at
10% significance level that the standard
deviation is greater than 2, thus the number
of springs that require reworking is
unacceptably large.
69
Testing the Variance
• Problem 16
• A random sample of 100 observations was taken
from a normal population. The sample variance
was 29.76.
• Can we infer at 2.5% significance level that the
population variance DOES NOT exceeds 30?
• Estimate the population variance with 90%
confidence.
70
Testing the Variance
• Problem – 16: Solution:
• H0:2 = 30
• H1:2 < 30
c2
=
(n – 1)s2
02
Rejection region: c2 < c21-a, n-1
c2 < 73.36
=
(100 – 1)29.76
30
= 98.21
Chi-squared Test of a Variance
Sample variance
Sample size
Hypothesized variance
Alpha
29.76
100
30
0.975
!
Chi-squared Stat
P(CHI<=chi) one-tail
chi-squared Critical one-tail
P(CHI<=chi) two-tail
chi-squared Critical two-tail
98.21
0.4964
73.3611
0.9928
97.8956
98.7740
71
Testing the Variance
• Problem 16 - conclusion
Since 98.208 > 73.36 we conclude that there
is insufficient evidence at 2.5% significance
level to infer that the variance is smaller
than 30.
72
Using Excel
– We can get an exact value of the probability
P(c2d.f.> c2) = ? for a given c2 and known d.f., and then
determine the p-value.
– Use the CHIDIST function: =CHIDIST(c2,d.f.)
For example: = CHIDIST(98.208,99) = .50359
That is: P(c299> 98.208) = .50359
– In our example we had a left hand tail rejection region,
and therefore the p-value is P(c299 < 98.208) = 1 - .50359
= .49641> .025
73
Using Excel
– We can get the exact c2 value for which
P(c2d.f.> c2) = a, for any given probability a and
known d.f., then define the rejection region:
– Use the CHIINV function =CHIINV(a,d.f.)
For example: =CHIINV(.975,99) = 73.36
That is: P(c299 > ?) = .975. c2 = 73.36
The rejection region is: c2 < 73.36.
74