Anne_Ryan_Short_Course 2-4

Report
LISA Short Course:
A Tutorial in t-tests and ANOVA using JMP
Anne Ryan
Assistant Professor of Practice
Department of Statistics, VT
[email protected]
Laboratory for Interdisciplinary Statistical Analysis
Laboratory for Interdisciplinary
Statistical Analysis
LISA helps VT researchers benefit
from the use of Statistics
Designing Experiments • Analyzing Data • Interpreting Results
Grant Proposals • Using Software (R, SAS, JMP, Minitab...)
Walk-In Consulting
Collaboration
From our website request a meeting for personalized
statistical advice
Great advice right now:
Meet with LISA before collecting your data
Available 1-3 PM: Mon—Fri in the GLC Video
Conference Room for questions requiring <30 mins
See our website for additional times and locations.
Short Courses
Designed to help graduate students
apply statistics in their research
All services are FREE for VT researchers. We assist with research—not class projects or homework.
www.lisa.stat.vt.edu
3

Defense: Represent the accused (defendant)

Prosecution: Hold the “Burden of Proof”—obligation

What’s the Assumed Conclusion?
to shift the assumed conclusion from an
oppositional opinion to one’s own
position through evidence
ANSWER: The accused is innocent until proven
guilty.
•Prosecution must convince the judge/jury that the
defendant is guilty beyond a reasonable doubt
4
Burden of Proof—Obligation to shift the
conclusion using evidence
Hypothesis
Test
Accept the status quo
(what is believed
before) until the data
suggests otherwise
Trial
Innocent until
proven guilty
5
Decision Criteria
Hypothesis
Test
Occurs by chance
less than 100α% of
the time (ex: 5%)
Trial
Evidence has to
convincing beyond
a reasonable
6
… a procedure that allows us to make statements
about a general population using the results of a
random sample from that population.
•
Two Types of Inferential Statistics:
• Hypothesis Testing
• Estimation
 Point estimates
 Confidence intervals
7
Hypothesis testing is a detailed protocol for
decision-making concerning a population by
examining a sample from that population.
8
1.
Test
2.
Assumptions
3.
Hypotheses
4.
Mechanics
5.
Conclusion
9
Used to test whether the
population mean is different
from a specified value.
10

In a glaucoma study, the following intraocular pressure
(mm Hg) values were recorded from a sample of 21
elderly subjects. Based on this data, can we conclude
that the mean intraocular pressure of the population
from which the sample was drawn differs from 14 mm
Hg?*
Intraocular Pressure
14.5
12.9
14
16.1
12
17.5
14.1
12.9
17.9
12
16.4
24.2
12.2
14.4
17
10
18.5
20.8
16.2
14.9
19.6
 = 15.6238  = 3.383
*Wayne, D. Biostatistics: A Foundation for Analysis in the Health Sciences.
5th ed. New York: John Wiley & Sons, 1991.
11


State the name of
the testing method
to be used
It is important to
not be off track in
the very beginning
Example 1:
1. Test: One sample t test for




List all the
assumptions required
for your test to be
valid.
Example 1:
2. Assumptions
• Simple random sample
(SRS) was used to collect
All tests have
assumptions
data
• The population distribution
Even if assumptions
from which the sample is
are not met you should
drawn is normal or
still comment on how
this affects your
approximately normal.
results.
Claims versus suspicions:


The “null hypothesis” is a statement describing a
claim about a population constant.
- The null hypothesis is denoted as  .
The “alternative hypothesis” is a statement
describing the researcher’s suspicions about the
claim. Also called “research hypothesis”.
- The alternative hypothesis is denoted as  .
Examples of possible hypotheses:
0 :  = 13   :  ≠ 13

For hypothesis testing there are three
versions for testing that are determined by
the context of the research question.
◦ Left Tailed Hypothesis Test (less than)
◦ Right Tailed Hypothesis Test (greater than)
◦ Two Tailed or Two Sided Hypothesis Test (not
equal to)
Left Tailed Hypothesis Test:


Researchers are only interested in whether the
true value is below the hypothesized value.
Example— Administrators of a health care
center want to know if the mean time spent by
patients in the waiting room is less than 20
minutes.
0 :  ≥ 20 .  :  < 20
Right Tailed Hypothesis Test:


Researchers are only interested in whether the
True Value is above the hypothesized value.
Example— Administrators of a health care
center want to know if the mean time spent by
patients in the waiting room is greater than 20
minutes.
0 :  ≤ 20 .  :  > 20
Two Tailed or Two Sided Hypothesis Test:
• The researcher is interested in looking above
and below their hypothesized value.
• Example— Administrators of a health care
center want to know if the mean time spent
by patients in the waiting room differs from
20 minutes.

0 :  = 20 .  :  ≠ 20
◦ Note: The direction of the alternative hypothesis
will be used when determining the p-value at a
later step.
Example 1:
3. Hypotheses

•
In a glaucoma study, the following intraocular
pressure (mm Hg) values were recorded from a
sample of 21 elderly subjects. Based on this data,
can we conclude that the mean intraocular
pressure of the population from which the
sample was drawn differs from 14 mm Hg?*
What are the hypotheses for Example 1?
 :  =  .  :  ≠ 
Where  is the true intraocular
pressure

Computational Part of the Test

Parts of the Mechanics Step
◦
◦
◦
◦
Stating the Significance Level
Finding the Rejection Rule
Computing the Test Statistic
Computing the p-value




Significance Level: Here we choose
a value to use as the significance
level, which is the level at which
we are willing to start rejecting the
null hypothesis.
Denoted by α which corresponds
to the Type 1 Error for the test.
Type 1 Error is error committed
when the true null hypothesis is
rejected. Ex: You reject  when
 is true.
* Default value is α=.05, use
α=.05 unless otherwise noted!
Example 1:
4. Mechanics:
Significance Level:
 = 0.05
*We use  = 0.05 here
because the significance
level was not given in the
problem.
*Note: The Type I error
would be concluding that
the true mean intraocular
pressure differs from 14
mm Hg, when in fact the
pressure is 14 mm Hg.
 :  =  .  :  ≠ 
Rejection Rule: State our
Example 1:
criteria for rejecting the null
hypothesis
4. Mechanics:
 Reject the null hypothesis
Rejection Rule:
( ) if the p-value≤ 
Reject H0 if  −  ≤ 0.05


p-value: The chance of
observing your sample results
or more extreme results
assuming that the null
hypothesis is true. If this
chance is “small” then you
may decide the claim in the
null hypothesis is false.


Test Statistic: Compute the test statistic,
which is usually a standardization of your
point estimate.
Translates your point estimate, a statistic, to
follow a known distribution so that is can be
used for a test.
A point estimate is a single numerical value
used to estimate the corresponding
population parameter.
•
 is the point estimate for 
In
many cases, including Example 1, the population
standard deviation  is unknown because it is a
parameter from the population that must be estimated.
The
best estimate for  is .
• Our standardized value becomes
 − 
 =  ~−

 : hypothesized mean
: sample mean
: sample standard deviation
: sample size
 : observed t test statistic
Test statistic for a one
sample t-test
This t observed (0 ) test statistic follows a
t distribution with  −  degrees of
freedom.
23
Example 1:
4. Mechanics
Test Statistic:

*In the example it was given that  = . 
and  = . .

 −  .  − 
=
=
= . 
/ 
. / 




p-value: After computing the test statistic,
now you can compute the p-value.
A p-value is the probability of obtaining a
point estimate as “extreme” as the current
value where the definition of “extreme” is taken
from the alternative hypothesis assuming the
null hypothesis is true.
The p-value depends on the alternative
hypothesis, so there are three ways to compute
p-values.
p-value: The chance of observing your sample
results or more extreme results assuming that
the null hypothesis is true. If this chance is
“small” then you may decide the claim in the
null hypothesis is false.
Example 1:
4. Mechanics:
P-value (in words):
The probability of
observing a sample
mean of . 
mm hg or a value
more extreme
assuming the true
mean pressure is
14 mm hg.
1.
The p-value is determined based on the sign of
the alternative hypothesis.
 :  ≠  . If this is the case, then the p-value
is the area in both tails of the t distribution.
0.4
0.3
Density

0.2
0.1
1/2 p-value
0.0
1/2 p-value
-t_obs
0
t_obs
2.
The p-value is determined based on the sign of
the alternative hypothesis.
 :  <  . If this is the case, then the p-value
is the area to the left of the observed test
statistic.
0.4
p-value
0.3
Density

0.2
0.1
0.0
0
t_obs
3.
The p-value is determined based on the sign of
the alternative hypothesis.
 :  >  . If this is the case, then the p-value
is the area to the right of the observed test
statistic.
0.4
0.3
Density

0.2
0.1
p-value
0.0
0
t_obs
Example 1:
4. Mechanics
p-value: *In the example the hypotheses are:

 :  =  .  :  ≠ 
0.4
Density
0.3
0.2
0.1
0.01986
0.0
0.01986
-2.2
0
t
2.2
Example 1:
4. Mechanics
p-value:
  −  = .  + .  = . 
0.4
Density
0.3
0.2
0.1
0.01986
0.0
0.01986
-2.2
0
t
2.2
Example 1:
4. Mechanics
p-value:
  −  = .  + .  = . 
JMP will give the 3 p-values and you must
select the correct p-value based on your
alternative hypothesis
 :  ≠ 14
 :  > 14
 :  < 14
Example 1:


Conclusion: Last step of the
hypothesis test.
Conclusions should always
include:
◦ Decision: reject or fail to reject
(not accept 0 ).
 When conducting hypothesis tests, we
assume that 0 is true, therefore the
decision cannot be to accept the null
hypothesis.
◦ Context: what your decision means
in context of the problem.
5. Conclusion:
With a pvalue=0.0398, which
is less than 0.05, we
reject 0 . There is
sufficient sample
evidence to conclude
that the true mean
intraocular pressure
differs from 14 mm
Hg.
Note: The significance level can be thought of as a tolerance
for things happening by chance. If we set α=.05 then we are
saying that we are willing to say what we observe may be
out of the ordinary, but unless it is something that occurs
less that 5% of the time we will attribute it to chance.


Possible Hypotheses:
2-Tailed Test
Right-Tailed
Left Tailed
Null hypothesis
0 :  = 0
0 :  ≤ 0
0 :  ≥ 0
Alternative
hypothesis
 :  ≠ 0
 :  > 0
 :  < 0
Test Statistic:
−

 =

Degrees of Freedom:  − 



Assumption: The population from which the sample
is drawn is normal or approximately normal.
33
Let T be a t random variable
with . . =  − 1 and  =

−0
/ 
Left-tailed test
0 :  ≥ 0 .  :  < 0  −
 =   ≤ 

*written as Prob<t in jmp
 Right-tailed test
0 :  ≤ 0 .  :  > 0
−
 =   ≥ 
*written as Prob>t in jmp
Two-tailed tests
0 :  = 0 .  :  ≠ 0
−
 = 2  ≥ ||
*written as Prob>|t| in jmp






34

In a glaucoma study, the following intraocular pressure (mm Hg)
values were recorded from a sample of 21 elderly subjects. Based
on this data, can we conclude that the mean intraocular pressure
of the population from which the sample was drawn differs from
14 mm Hg?*
 = 15.6238  = 3.383
T: One sample t-test for 
A: i) SRS was used ii)The population from which the sample is drawn is
normal or approximately normal.
H: 0 :  = 14 .  :  ≠ 14;  is the true mean intraocular pressure
M:  = 0.05
Reject 0 if p-value≤0.05
−0
15.6238−14
 =
=
= 2.20
/ 
3.383/ 21
p-value=  > 2.20 +   < 2.20 = 0.0398 (calculated using JMP: Prob>|t|)
C: With a p-value less than 0.05, we reject 0 . There is sufficient sample
evidence to conclude that the true mean intraocular pressure differs
from 14 mm Hg.
35
•
JMP Demonstration
• Open Pressure.jmp
• AnalyzeDistribution
• Complete the dialog box as
shown and select OK.
• Select the red arrow next to
“Pressure” and select Test
Mean.
• Complete Dialog box as
shown and select OK.
• Select the red arrow next to
“Pressure” and select
Confidence Interval->0.95.
36

The normal quantile plot may also be
created in JMP to check the normality
assumption. The assumption is met
if the points fall close to the red line.
37
Two sample t-tests are used to
determine whether the population
mean of one group is equal to,
larger than or smaller than the
population mean of another group.
38


The major goal is to determine whether a
difference exists between two populations.
Examples:
◦ Compare blood pressure for male and females.
◦ Compare the proportion of smokers and
nonsmokers with lung cancer.
◦ Compare weight before and after treatment.
◦ Is the mean cholesterol of people taking drug A
lower than the mean cholesterol of people taking
drug B?
39
The population means of the two groups are not
equal.
H0: μ1 = μ2
Ha: μ1 ≠ μ2
 The population mean of group 1 is greater than the
population mean of group 2.
H0: μ1 = μ2
Ha: μ1 > μ2
 The population mean of group 1 is less than the
population mean of group 2.
H0: μ1 = μ2
Ha: μ1 < μ2

40



The two samples are random and
independent.
The populations from which the samples are
drawn are either normal or the sample sizes
are large.
The populations have the same standard
deviation.
41

Step 3: Calculate the test statistic
 − 
 =



+
 
 =
where


 −   +  −  
 +  − 
Step 4: Calculate the appropriate p-value.
Step 5: Write a Conclusion.
42

Possible Hypotheses:
2-Tailed Test

Right-Tailed
Left Tailed
Null
0 : 1 − 2 = 0
0 : 1 − 2 ≤ 0
0 : 1 − 2 ≥ 0
Alternative
 : 1 − 2 ≠ 0
 : 1 − 2 > 0
 : 1 − 2 < 0
Test Statistic:
 − 
 =

 =


+
 
Degrees of Freedom
n1 + n2 − 2
 −   +  −  
 +  − 
Assumption: The populations from which
both samples are drawn are normal or
approximately normal.
43



A researcher would like to know whether the
mean sepal width of setosa irises is different
from the mean sepal width of versicolor irises.
The researcher randomly selects 50 setosa irises
and 50 versicolor irises and measures their sepal
widths.
Step 1 Hypotheses:
H0: μsetosa = μversicolor
Ha: μsetosa ≠ μversicolor
http://en.wikipedia.org/
wiki/Iris_flower_data_set
http://en.wikipedia.org/
wiki/Iris_versicolor
44

Steps 2-4:
JMP Demonstration:
Analyze  Fit Y By X
Y, Response: Sepal Width
X, Factor: Species
Means/ANOVA/Pooled t
Normal Quantile Plot  Plot Actual by Quantile
45
-2.33 -1.64
-1.28 -0.67
0.0
0.67
setosa
2.33
1.281.64
0.98
0.9
0.8
0.5
0.2
0.1
0.02
versicolor
Normal Quantile
Step 5 Conclusion: There is strong evidence
(p-value < 0.0001) that the mean sepal widths
for the two varieties are different.

46
The paired t-test is used to
compare the population
means of two groups when
the samples are dependent.
47


The objective of paired comparisons is to
minimize sources of variation that are not of
interest in the study by pairing observations with
similar characteristics.
Example:
A researcher would like to determine if
background noise causes people to take longer
to complete math problems. The researcher gives
20 subjects two math tests one with complete
silence and one with background noise and
records the time each subject takes to complete
each test.
48
The population mean difference is not equal to zero.
H0: μdifference = 0
Ha: μdifference ≠ 0
 The population mean difference is greater than zero.
H0: μdifference = 0
Ha: μdifference > 0
 The population mean difference is less than a zero.
H0: μdifference = 0
Ha: μdifference < 0

49

The sample is random.

The data is matched pairs.

The differences have a normal distribution or
the sample size is large.
50

Step 3: Calculate the test Statistic:

 = 


Where  bar is the mean of the differences
and sd is the standard deviations of the
differences.

Step 4: Calculate the p-value.

Step 5: Write a conclusion.
51

Possible Hypotheses:
2-Tailed


Right Tailed
Left Tailed
Null
0 :  = 0
0 :  ≤ 0
0 :  ≥ 0
Alternative
 :  ≠ 0
 :  > 0
 :  < 0
Test Statistic:

 = 


Degrees of Freedom:  − 
Assumption: The population of differences
is normal or approximately normal.
52


A researcher would like to determine whether
a fitness program increases flexibility. The
researcher measures the flexibility (in inches)
of 12 randomly selected participants before
and after the fitness program.
Step 1: Formulate a Hypothesis
H0: μAfter - Before = 0
Ha: μ After - Before > 0
http://office.microsoft.com/en-us/images
53

Steps 2-4:
JMP Analysis:
Create a new column of After – Before
Analyze  Distribution
Y, Columns: After – Before
Normal Quantile Plot
Test Mean
Specify Hypothesized Mean: 0
54
Step 5 Conclusion: There is not evidence that
the fitness program increases flexibility.
55
ANOVA is used to determine
whether three or more
populations have different
distributions.
56

ANOVA is used to determine whether three or
more populations have different distributions.
A
B
C
Medical Treatment
57
The
first step is to use the ANOVA F test to
determine if there are any significant differences
among the population means.

If the ANOVA F test shows that the population
means are not all the same, then follow up tests
can be performed to see which pairs of population
means differ.
58
yij  i   ij
Where
yij is theresponseof the jth trialon t heith factorlevel
i is themean of theith group
 ij ~ N (0,  2 )
i  1,  , r
j  1, , ni
In other words, for each group the observed
value is the group mean plus some random
variation.
59

Step 1: We test whether there is a
difference in the population means.
H 0 : 1  2    r
H a : T hei are not all equal.
60




The samples are random and independent of
each other.
The populations are normally distributed.
The populations all have the same standard
deviations.
The ANOVA F test is robust to the assumptions
of normality and equal standard deviations.
61
C
A
B
C
A
B
Medical Treatment
Compare the variation within the samples to the
variation between the samples.
62
F
Variationbetween Groups MSG

Variationwithin Groups MSE
Variation within groups
small compared with
variation between groups
→ Large F
Variation within groups
large compared with
variation between groups
→ Small F
63

The mean square for groups, MSG, measures the
variability of the sample averages.

SSG stands for sums of squares groups.
SSG
MSG 
r -1
n1 ( y1  y ) 2  n 2 ( y2  y ) 2    n r ( y1  y ) 2

r -1
64
Mean square error, MSE, measures the variability
within the groups.
 SSE stands for sums of squares error.

SSE
n-r
(n1 - 1)s12  (n 2 - 1)s22    (n r - 1)s2r

n-r
Where
MSE 
ni
si 
(y
j 1
ij
 yi  )
ni  1
65

Step 4: Calculate the p-value.

Step 5: Write a conclusion.
66



A researcher would like to determine if three
drugs provide the same relief from pain.
60 patients are randomly assigned to a
treatment (20 people in each treatment).
Step 1: Formulate the Hypotheses
H0: μDrug A = μDrug B = μDrug C
Ha : The μi are not all equal.
http://office.microsoft.com/en-us/images
67

JMP demonstration
Analyze  Fit Y By X
Y, Response: Pain
X, Factor: Drug
Normal Quantile Plot  Plot Actual by
Quantile
Means/ANOVA
68
-2.33 -1.64
-1.28 -0.67
75
0.0
0.67
1.281.64
2.33
Drug
B
Drug C
Drug A
65
60
0.98
0.9
Drug C
0.8
Drug B
Drug
0.5
Drug A
0.2
50
0.1
55
0.02
Pain
70
Normal Quantile
Step 5 Conclusion: There is strong evidence
that the drugs are not all the same.

69



The p-value of the overall F test indicates
that the level of pain is not the same for
patients taking drugs A, B and C.
We would like to know which pairs of
treatments are different.
One method is to use Tukey’s HSD (honestly
significant differences).
70

Tukey’s test simultaneously tests
H 0 : i  i '
H a : i  i '
for all pairs of factor levels. Tukey’s HSD
controls the overall type I error.
JMP demonstration
Oneway Analysis of Pain By Drug 
Compare Means  All Pairs, Tukey HSD

71
Level
Drug C
Drug C
Drug B
- Level
Drug A
Drug B
Drug A
Difference
5.850000
3.600000
2.250000
Std Err Dif
1.677665
1.677665
1.677665
Lower CL
1.81283
-0.43717
-1.78717
Upper CL
9.887173
7.637173
6.287173
p-Value
0.0027 *
0.0897
0.3786
The JMP output shows that drugs A and C
are significantly different.

72
73


We are interested in the effect of two
categorical factors on the response.
We are interested in whether either of the two
factors have an effect on the response and
whether there is an interaction effect.
◦ An interaction effect means that the effect on the
response of one factor depends on the level of the
other factor.
74
No Interaction
Interaction
Low
High
Dosage
Drug A
Drug B
Improvement
Improvement
Drug A
Drug B
Low
High
Dosage
75
yijk     i   j  ( ) ij   ijk
Where
yijk is theresponseof thekth trialon theith factorA leveland the jth factorB level
 is theoverallmean
 i is themain effectof theith levelof factorA
 j is themain effectof the jth levelof factorB
( ) ij is theinteraction effectof theith levelof factorA and the jth levelof factorB
 ijk ~ N (0,  2 )
i  1,  , a
j  1,  , b
k  1,...,nij
76


We would like to determine the effect of two
alloys (low, high) and three cooling
temperatures (low, medium, high) on the
strength of a wire.
JMP demonstration
Analyze  Fit Model
Y: Strength
Highlight Alloy and Temp and click Macros 
Factorial to Degree
Run Model
http://office.microsoft.com/en-us/images
77
Conclusion: There is strong evidence of an
interaction between alloy and temperature.
78
 The
one sample t-test allows us to test
whether the population mean of a group is
equal to a specified value.
 The
two-sample t-test and paired t-test
allow us to determine if the population means
of two groups are different.
 ANOVA
allows us to determine whether the
population means of several groups are
different.
79

For information about using SAS, SPSS and R
to do ANOVA:
http://www.ats.ucla.edu/stat/sas/topics/anova
.htm
http://www.ats.ucla.edu/stat/spss/topics/anov
a.htm
http://www.ats.ucla.edu/stat/r/sk/books_pra.
htm
80


Fisher’s Irises Data (used in one sample and
two sample t-test examples).
Flexibility data (paired t-test example):
Michael Sullivan III. Statistics Informed
Decisions Using Data. Upper Saddle River,
New Jersey: Pearson Education, 2004: 602.
81

similar documents