Hypothesis Testing

Report
Hypothesis Testing
An Inference Procedure
We will study procedures for both the unknown
population mean on a quantitative variable and
the unknown population proportion on a
qualitative variable. This section is for the
proportion.
1
Analogy
Story about hypothesis tests. Not really stats, but an idea to
consider. Say I have two decks of cards. One deck is a regular deck –
spades, hearts, diamonds and clubs. The other deck is special – 4
sets of hearts.
Now, I take out one of the decks, but you do not know which one. In
the language of statistics the null hypothesis will be that I took out
the regular deck. You will accept the null hypothesis unless an event
occurs that has a really low probability. If a really low probability
event occurs you will reject the null hypothesis and go with the
alternative hypothesis.
So, I take out a deck and deal you five cards – a royal flush hearts!
You would reject the null hypothesis of a regular deck and go with
the alternative that the deck I pulled out is the special one because a
2
royal flush hearts has a low probability in a regular deck.
So, in my deck of cards example I have introduced 2 ideas you may not have
heard of at this point in your life. I mention
The null hypothesis that we typically call by shorthand Ho:, and I mention
The alternative or research hypothesis H1:.
The tradition in statistics is to make the null hypothesis the focal point of our
test. When we do the test we will either
a) Reject the null hypothesis and go with the alternative, or
b) Not reject the null hypothesis (which is a way of saying stay with the null
hypothesis)
So, later we will spend some time doing the mechanics of the test, but I have a
few more ideas to consider.
On the next slide you will see a table. The columns relate to the idea that there
is truth in the world. We really consider 1 column at a time and while looking at
a column we call that column the real truth.
The rows relate to how we decide. Maybe this is a little theoretical. So, let’s
get to the table!
3
Reject Ho: (and go with alt)
Do not reject Ho:
Ho: is true
Type I error
Good job!
Ho: is false
Good job!
Type II error
So, let’s look at the first column. Here we say Ho: is true.
In the first row if we reject Ho: when it is true, that would be bad because we
have rejected the truth and we say a type I error has been made.
In the second row we do not reject Ho: when it is true. That is good because we
have not rejected the truth.
Now, let’s look at the second column. Here we say Ho: is false.
In the first row if we reject Ho: when it is false, that would be good because we
have rejected something that is not the truth.
In the second row we do not reject Ho: when it is false. That is bad because we
have not rejected something that is NOT the truth. This is called making a type
II error.
4
Type I Error
A Type I error is a situation where you reject the null
hypothesis, Ho, when it is true and should not be rejected.
The probability of making a type I error is called alpha (α)and is often referred to as the level of significance.
There is a consequence to rejecting a true null hypothesis.
Depending on the nature of the consequence we pick the
value of alpha. Traditional values of alpha are .01, .05 and
.1.
5
Type II Error
A type II error is a situation where the null hypothesis is not
rejected when it should be because the null is false.
The probability of making a type II error is called beta, β.
In an introductory statistics class such as ours we typically
focus on the type I error.
6
Background
There are times we would like to know about the unknown
population proportion. But, it is often expensive and too time
consuming to investigate the whole population. So, a sample is
taken. The method of confidence intervals is based on idea that a
point estimate would vary from sample to sample in theory and
so from the one sample we do take we build in the variability and
then are a certain percent confident our interval contains the
unknown value.
Hypothesis testing will rely on some of the same ideas used in
confidence interval, but here there is a least a starting point for
the unknown value. The starting point can be from past work or
belief one has in a process.
Note we never know the truth for sure because we do not look at
7
the whole population. We live in a probabilistic world!
As an example, let’s say a daily newspaper is concerned about readers continuing
to buy the paper. One particular area of concern is the coverage of local, state
level and national/international sports. Maybe the people who run the paper think
about how satisfied the readers are. Readers might be satisfied, not interested, or
not satisfied with the sports coverage.
Let’s say a population proportion of more than .8 being satisfied is critical for the
business in keeping sales of the paper at an agreeable level. If the proportion is
not that high they will change the section.
The null hypothesis and alternative hypotheses might be stated
Ho: p ≤ .8 (which would mean change the section)
H1: p > .8 (which would mean no change is needed)
Now, if the null hypothesis is really the truth and it is rejected (a type I error) the
business will think the proportion satisfied with the sports section is more than .8
and they probably will not change the section. But, the population proportion
satisfied is really .8 or less and the business should change the sports section.
They will likely start losing customers. They will take no action about the section
when they should.
If the null hypothesis is true and they do not reject it they will make changes to the
section to keep people buying the paper. They will take needed action.
8
If the null hypothesis is not true and the business rejects the null then they will go
with the alternative hypothesis and they can keep the sports section the way it is.
If the null hypothesis in not true and the business does not reject the null
hypotheses a type II error has occurred. In this case they should change the
paper but they won’t. This could lead to less sales.
SO making a type I or a type II error could lead to problems with their business.
Again here we will pay attention to the type I error.
Let’s recall if we go out into the population of interest, collect a sample of data
where the sample size is n, and calculate P hat, that the distribution of all P hat is
a normal distribution with mean equal to the population proportion and a
standard error = the square root of ((p)(1 – p)/n).
In our newspaper example the value that was critical to the paper was .8. In this
context we can do with a population proportion of .8.
9
Alpha
.8
Critical P hat
P hat
Rejection region
Let’s remember here that we do not know if the population proportion
is .8, but it is an important idea for the business so we make it the
value. If it was the population proportion then taking a sample and
calculating P hat could lead to any value.
Any P hat value from a sample less than .8 would be interpreted as we
do not reject the null. But, if the population proportion truly is .8,
could a sample proportion have a value more than .8? Yes! But, as we
move to the right of .8 at some point we put a dividing line where if we
are right of the dividing line we will reject Ho and go with the alt. 10
Notice when we create the dividing line we could be wrong and we would make
a type 1 error. So, we will control the probability of making a type I error by
making alpha low. Let’s say that alpha = .05 is our definition of low.
Then, the Z associated with an upper tail area = .05 is 1.645 and this Z is
associated with the critical P hat you see in the picture.
In this context our hypothesis testing will follow a normal
distribution and we will calculate a Z statistic called the
standardized test statistic or z test stat.
The z test stat is (P hat – p under Ho)/standard error.
The standard error of p = sqrt[(p under Ho)(1 – p Ho null)/sample
size].
Now, p under Ho is the hypothesized value of the proportion.
11
Say that in a sample of size 100, that 87 folks say they are
satisfied with the sports page.
P hat = 87/100 = .87
The Z test stat = (.87 - .8)/square root((.8)(.2)/100) = .07/.04 =
1.75.
Thus 1.75 > 1.645, so we would reject the null!
Could a sample proportion of .87 happen when the population
proportion is .8? Yes, but the chance of getting .87, or more, is in
the low probability area and so this type of error would only
happen, in this case, 5 % of the time.
You may have noticed the alternative hypothesis is a > sign. In
this context we say we have a one-tailed test.
12
When we have an inequality in the alternative hypothesis we have
a one tailed test and we concentration the whole probability of a
type I error in one tail. When H1: had a > sign the tail was on the
right side of the distribution. When we have a < sign the tail will
be on the left side of the distribution.
Two tailed test
Sometimes the alternative hypothesis will be a not equal sign ≠.
This means we need to look at both sides of the distribution for
dividing lines to reject the null hypothesis.
On the next slide I have an example where we will make alpha =
.1. Another thing I will do is just show a picture of the Z
distribution that we have on pages 309 and 310 of the book.
13
.05
Reject
region
Alpha/2 = .05
Lower
critical z
Upper
Critical z
Reject region
Let’s do a problem. Say a magazine claims that 25% of its readers
are college students.
Ho: p = .25 (.25 is p under null, here)
Ha: p ≠ .25
With a level of significance of .1 and a two-tailed test each tail will
have .05. From the z table the critical z’s are – 1.645 and 1.645.
14
A sample of 200 college students were asked if they read the
magazine and the sample proportion that said yes was .21. The z
statistic from the sample is
(.21 - .25)/sqrt[(.25)(.75)/200] = - 1.31 and thus we can not reject
the null.
To reject the null we need a z stat of less than -1.645 or greater
than 1.645.
15
Critical Value approach - two tailed
Alpha/2
Alpha/2
Reject region
Reject region
Do not reject
region
Lower critical
value
Upper critical
value
16
Critical value approach
When the alternative hypothesis is a not equal sign we
have what is called a two tailed test because if we are off
in either direction we are concerned. In this case we
divide up the alpha value in half and make our rejection
regions have areas add up to alpha. If alpha = .05 we
would have .025 in each tail of the distribution, for
example.
As we said earlier, if the alternative is an inequality we
have a one-tailed test and put all of alpha in that 1 tail.
There is another approach to hypothesis testing.
17
p – value approach
The critical value approach had you set up rejection regions and
in the end work with a sample. In the p – value approach you
will work with the sample almost as soon as you can.
Remember we had: A sample of 200 college students were asked
if they read the magazine and the sample proportion that said yes
was .21. The z statistic from the sample is
(.21 - .25)/sqrt[(.25)(.75)/200] = - 1.31
Since the z from the sample is – 1.31 we see in the z table the
area to the left of -1.31 = .0951. On that side alpha/2 = .05
When the area from the sample value > alpha/2, then
2 times the area from the sample value > alpha.
18
p – value approach
The p – value for a sample proportion is the probability in
the tail given the null hypothesis is true. If we have a two
tail test we just double the one tail value to get the p –
value.
Then if p – value > alpha we do not reject the null,
but if the p – value < alpha we reject the null because we
know the Zstat is more extreme than the critical values.
If the p – value is low, then Ho must go. Note in our work a
“low” p – value will be defined from problem to problem.
Low from problem to problem may be called the level of
significance or alpha.
19
A sample of 200 college students were asked if they read the
magazine and the sample proportion that said yes was .21. The z
statistic from the sample is
(.21 - .25)/sqrt[(.25)(.75)/200] = - 1.31 and thus we can not reject
the null.
Since the z from the sample is – 1.31 and we have a two-tailed
test the p-value is 2(.0951)=.1902.
Note the .0951 is the tail area in the z table for a z = -1.31
Since .1902 > .1 we do not reject Ho.
20
P - Value approach - two tailed
Alpha/2
Alpha/2
Reject region
P hat
Reject region
Do not reject
region
Lower critical
value
Upper critical
value
21
You may have noticed on the previous slide that I
reproduced the critical value approach slide and re-labeled
it.
Notice how the P hat value is in the do not reject Ho
region. The area to the left of the P hat has to be bigger
than the alpha divided by 2 area because we are inside
the critical area that was picked by the alpha divided 2
value.
I think we should compare the area to the left of P hat and
the value alpha divided by 2. BUT, that is not what folks
do. They double the area to the left of P hat, call it a pvalue and compare it to alpha!
22
One last point, not to confuse, but try to clear things up for
you in hypothesis testing.
With critical value approach when you have a two tailed test
you set up rejection regions by splitting alpha in half
because being away from the center in either direction
leads to doubt about the Ho:. If P hat is more extreme than
the critical values you reject Ho.
With the p-value approach you find P hat and calculate the
area more extreme (or away from the center) and then
double that area to compare with alpha.
Remember, alpha controls for the probability of a type I
error.
23

similar documents