P - University of Kansas Medical Center

Report
Introduction to Biostatistics for Clinical
Researchers
University of Kansas
Department of Biostatistics
&
University of Kansas Medical Center
Department of Internal Medicine
Schedule
5th lecture, TBD
Materials
 PowerPoint files can be downloaded from the Department of
Biostatistics website at http://biostatistics.kumc.edu
 A link to the recorded lectures will be posted in the same location
Topics
 Comparing Two (or more) Population Means (continued)
 Simple Linear Regression
 Comparing Two (or more) Independent Proportions
Comparing Two (or More) Population Means
(Continued)
Sampling Distribution Detail
 What exactly is the sampling distribution of the difference in
sample means?
 A Student’s t distribution is used with n1 - n2 - 2 degrees of
freedom (total sample size minus two)
Two-Sample t-test
 In a randomized design, 23 patients with hyperlipidemia were
randomized to either treatment A or treatment B for 12 weeks
 12 to A
 11 to B
 LDL cholesterol levels (mmol/L) measured on each subject at
baseline and 12 weeks
 The 12-week change in LDL cholesterol was computed for each
subject
Treatment Group
A
B
12
11
Mean LDL change
-1.41
-0.32
Standard
deviation of LDL
changes
0.55
0.65
N
Two-Sample t-test
 Is there a difference in LDL change between the two treatment
groups?
 Methods of inference
 CI for the difference in mean LDL cholesterol change between
the two groups
 Statistical hypothesis test
95% CI for Difference in Means
Treatment Group
A
B
12
11
Mean LDL change
-1.41
-0.32
Standard
deviation of LDL
changes
0.55
0.65
N

x1 - x 2  t1-,n1  n2 -2 SE  x1 - x 2 
0.552 0.652
-1.41 -  -0.32   t1-,n1  n2 -2

12
11
-1.09  t1-,n1  n2 -2  0.25 
95% CI for Difference in Means
 How many standard errors to add and subtract (i.e., what is the
correct multiplier)?
 The number we need comes from a t with 12 + 11 - 2 = 21 degrees
of freedom
 From t table or excel, this value is 2.08
 The 95% CI for true mean difference in change in LDL cholesterol,
drug A to drug B is:
-1.09  2.08  0.25 
 -1.61, -0.57 
Hypothesis Test to Compare Two Independent Groups
 Two-sample (unpaired) t-test: getting a p-value
 Is the change in LDL cholesterol the same in the two treatment
groups?
 HO: μ1 = μ2  HO: μ1 - μ2 = 0
 HA: μ1 ≠ μ2  HA: μ1 - μ2 ≠ 0
Hypothesis Test to Compare Two Independent Groups
 Recall the general “recipe” for hypothesis testing:
1. Assume HO is true
2. Measure the distance of the sample result from the
hypothesized result (here, it’s 0)
3. Compare the test statistic (distance) to the appropriate
distribution to get the p-value
t
observed difference - null difference

SE  observed difference 
t
x1 - x 2 -  O

SE  x1 - x 2 

x1 - x 2
s12 s22

n1 n2
Diet Type and Weight Loss Study
 In the diet types and weight loss study, recall:
x1 - x2  -1.09

SE  x1 - x2   0.25
 In this study:
t
-1.09
 -4.4
0.25
 This study result was 4.4 standard errors below the null mean of 0
How are p-values Calculated?
 Is a result 4.4 standard errors below 0 unusual?
 It depends on what kind of distribution we are dealing with
 The p-value is the probability of getting a result as extreme or
more extreme than what was observed (-4.4) by chance, if the
null hypothesis were true
 The p-value comes from the sampling distribution of the difference
in two sample means
 What is the sampling distribution of the difference in sample
means?
 t1211-2  21
Hyperlipidemia Example
 To compute a p-value, we need to compute the probability of
being 4.4 or more SE away from 0 on the t with 21 degrees of
freedom
P = 0.0003
Summary: Weight Loss Example
 Statistical Methods
 Twenty-three patients with hyperlipidemia were randomly
assigned to one of two treatment groups: A or B
 12 patients were assigned to receive A
 11 patients were assigned to receive B
 Baseline LDL cholesterol measurements were taken on each
subject and LDL was again measured after 12 weeks of
treatment
 The change in LDL cholesterol was computed for each subject
 The mean LDL changes in the two treatment groups were
compared using an unpaired t-test and a 95% confidence
interval was constructed for the difference in mean LDL
changes
Summary: Weight Loss Example
 Result
 Patients on A showed a decrease in LDL cholesterol of 1.41
mmol/L and subjects on treatment B showed a decrease of
0.32 mmol/L (a difference of 1.09 mmol/L, 95% CI: 0.57 to
1.61 mmol/L)
 The difference in LDL changes was statistically significant (p <
0.001)
FYI: Equal Variances Assumption
 The “traditional” t-test assumes equal variances in the two groups
 This can be formally tested using another hypothesis test
 But why not just compare observed values of s1 to s2?
 There is a slight modification to allow for unequal variances-this
modification adjusts the degrees of freedom for the test, using
slightly different SE computation
 If you want to be truly ‘safe’, it is more conservative to use the
test that allows for unequal variances
 Makes little to no difference in large samples
FYI: Equal Variances Assumption
 If underlying population level standard deviations are equal, both
approaches give valid confidence intervals, but intervals assuming
unequal standard deviations are slightly wider (p-values slightly
larger)
 If underlying population level standard deviations are unequal, the
approach assuming equal variances does not give valid confidence
intervals and can severely under-cover the goal of 95%
Non-Parametric Analogue to the Two-Sample t
Alternative to the Two Sample t-test
 “Non-parametric” refers to a class of tests that do not assume
anything about the distribution of the data
 Nonparametric tests for comparing two groups
 Mann-Whitney Rank-Sum test (Wilcoxon Rank Sum Test)
 Also called Wilcoxon-Mann-Whitney Test
 Attempts to answer: “Are the two populations distributions
different?”
 Advantages: does not assume populations being compared are
normally distributed, uses only ranks, and is not sensitive to
outliers
Alternative to the Two Sample t-test
 Disadvantages:
 often less sensitive (powerful) for finding true differences
because they throw away information (by using only ranks
rather than the raw data)
 need the full data set, not just summary statistics
 results do not include any CI quantifying range of possibility for
true difference between populations
Health Education Study
 Evaluate an intervention to educate high school students about
health and lifestyle over a two-month period
 10 students randomized to intervention or control group
 X = post-test score - pre-test score
 Compare between the two groups
Health Education Study
• Only five individuals in each sample
• We want to compare the control and intervention to assess
whether the ‘improvement’ in scores are different, taking random
sampling error into account
Intervention 5
0
7
2
19
Control
-5
-6
1
4
6
• With such a small sample size, we need to be sure score
improvements are normally distributed if we want to use the t test
(BIG assumption)
• Possible approach: Wilcoxon-Mann-Whitney test
Health Education Study
 Step 1: rank the pooled data, ignoring groups
Intervention 5
0
7
2
19
Control
-5
-6
1
4
Intervention 7
3
9
5
10
Control
2
1
4
6
6
8
 Step 2: reattach group status
 Step 3: find the average rank in each of the two groups
3  5  7  9  10
 6.8
5
1 2  4  6  8
 4.2
5
Health Education Study
 Statisticians have developed formulas and tables to determine the
probability of observing such an extreme discrepancy in ranks (6.8
versus 4.2) by chance alone (p)
 The p-value here is 0.17
 The interpretation is that the Mann-Whitney test did not show
any significant difference in test score ‘improvement’ between
the intervention and control group (p = 0.17)
 The two-sample t test would give a different answer (p = 0.14)
 Different statistical methods give different p-values
 If the largest observation was changed, the MW p would
not change but the t p-value would
Notes
 The t or the nonparametric test?
 Statisticians will not always agree, but there are some
guidelines
 Use the nonparametric test if the sample size is small and you
have no reason to believe data is ‘well-behaved’ (normally
distributed)
 Only ranks are available
Summary: Educational Intervention Example
 Statistical methods
 10 high school students were randomized to either receive a
two-month health and lifestyle education program or no
program
 Each student was administered a test regarding health and
lifestyle issues prior to randomization and after the two-month
period
 Differences in the two test scores were computed for each
student
 Mean and median test score changes were computed for each
of the two study groups
 A Mann-Whitney rank sum test was used to determine if there
was a statistically significant difference in test score change
between the intervention and control groups at the end of the
two-month study period
Summary: Educational Intervention Example
 Results
 Participants randomized to the educational intervention scored
a median five points higher on the test given at the end of the
two-month study period, as compared to the test administered
prior to the intervention
 Participants randomized to receive no educational intervention
scored a median one point higher on the test given at the end
of the two-month study period
 The difference in test score improvements between the
intervention and control groups was not statistically significant
(p = 0.17)
Comparing Means between More than Two Independent
Populations
Motivating Example
 Suppose you are interested in the relationship between smoking
and mid-expiratory flow (FEF), a measure of pulmonary health
 Suppose you recruit study subjects and classify them into one of
six smoking categories
 Nonsmokers (NS)
 Passive smokers (PS)
 Non-inhaling smokers (NI)
 Light smokers (LS)
 Moderate smokers (MS)
 Heavy smokers (HS)
Motivating Example
 You are interested in whether differences exist in mean FEF among
the six groups
 Main outcome variable is FEF in liters per second
Motivating Example
 One strategy is to perform lots of two-sample t-tests (for each
possible two-group comparison)
 In this example, there would be 15 comparisons you would need to
do:
 NS-PS
 NS-NI
 ...
 MS-HS
 It would be nice to have one “catch-all” test
 Something that would tell you whether there were any
differences among the six groups
 If so, you could then do group-to-group comparisons
to look for specific differences
Extension of the Two-Sample t-test
 Analysis of Variance (ANOVA)
 The t-test compares means in two populations
 ANOVA compares means among more than two populations with
one test
 The p-value from ANOVA answers the question:
 “Are there any differences in the means among the
populations?”
Extension of the Two-Sample t-test
 General idea behind ANOVA, comparing means for k > 2 groups:
 HO: μ1 = μ2 = . . . = μk
 HA: At least one μj is different
Example
 Smoking and FEF (Forced Mid-Expiratory Flow Rate)1
 A sample of over 3,000 persons was classified into one of six
smoking categorizations based on responses to smoking related
questions
1
White, J.R., Froeb, H.F. (1980). Small-airways dysfunction in non-smokers chronically exposed
to tobacco smoke, NEJM 302: 13.
Example
 Nonsmokers (NS)
 Passive smokers (PS)
 Non-inhaling smokers (NI)
 Light smokers (LS)
 Moderate smokers (MS)
 Heavy smokers (HS)
Example
 Smoking and FEF
 From each smoking group, a random sample of 200 men was
drawn (except for the non-inhalers, as there were only 50 male
non-inhalers in the entire sample of 3,000)
 FEF measurements were taken on each of the subjects
Data Summary
 Based on a one-way analysis of variance, there are statistically
significant differences in FEF levels among the six smoking groups
(p < 0.001)
What’s the Rationale?
 In the simplest case, the variation in subject responses is broken
down into parts: variation in response attributed to the treatment
(group/sample), to error (subject characteristics + everything else
not controlled for)
 The variation in the treatment (group/sample) means is compared
to the variation within a treatment (group/sample)
 If the between treatment variation is a lot bigger than the within
treatment variation, that suggests there are some different effects
among the treatments
Example: Scenarios
1
2
3
Example: Scenarios
 There is an obvious difference between scenarios 1 and 2. What is
it?
 Just looking at the boxplots, which of the two scenarios (1 or 2) do
you think would provide more evidence that at least one of the
populations is different from the others? Why?
F Distribution
Properties, F(dfnum, dfden)
 The total area under the curve is one.
 The distribution is skewed to the right.
 The values are non-negative, start at
zero and extend to the right,
approaching but never touching the
horizontal axis.
 The distribution of F changes as the
degrees of freedom change.
F=
Variation between the sample means
Natural variation within the samples
F Statistic
F=
Variation between the sample means
Natural variation within the samples
 Case A: If all the sample means were exactly the same, what
would be the value of the numerator of the F statistic?
 Case B: If all the sample means were spread out and very
different, how would the variation between sample means
compare to the value in A?
F Statistic
F=
Variation between the sample means
Natural variation within the samples
 So what values could the F statistic take on?
 Could you get an F that is negative? Why not?
 What type of values of F would support the alternative hypothesis?
Example: F Statistic
Three independent random samples
 Scenario 1: means 60, 65, 70; s = 1.5
 Scenario 2: means 60, 65, 70; s = 3
 Scenario 3: means 65, 65, 65; s = 3
Scenario
F
P
1: HA is true
129
0
2: HA is true
45
0
3: HO is true
0
0.48
Summary: Smoking and FEF
 Statistical Methods
 200 men were randomly selected from each of five smoking
classification groups (non-smoker, passive smokers, light
smokers, moderate smokers, and heavy smokers), as well as 50
men classified as non-inhaling smokers for a study designed to
analyze the relationship between smoking and respiratory
function
Summary: Smoking and FEF
 Statistical Methods
 Analysis of variance was used to test for any differences in FEF
levels among the six groups of men
 Individual group comparisons were performed with a series of
two-sample t-tests and 95% confidence intervals were
constructed for the mean difference in FEF between each
combination of groups
 Analysis of variance showed statistically significant (p < 0.001)
differences in FEF between the six groups of smokers
 Non-smokers had the highest mean FEF value (3.78 L/s) and
this was statistically significantly larger than the five other
smoking-classification groups
Summary: Smoking and FEF
 Results
 Analysis of variance showed statistically significant (p < 0.001)
differences in FEF between the six groups of smokers
 Non-smokers had the highest mean FEF value (3.78 L/s) and
was statistically significantly larger than the five other
smoking-classification groups
 The mean FEF value for non-smokers was 1.19 L/s higher than
the mean FEF for heavy smokers (95% CI: 1.03-1/35 L/s), the
largest mean difference between any two smoking groups
 Confidence intervals for all smoking group FEF comparisons are
in Table 1
Example
 FEV1 and three medical centers1
 Data was collected on 63 patients with coronary artery disease
at 3 different medical centers: Johns Hopkins, Ranchos Los
Amigos Medical Center, St. Louis University School of Medicine)
 Purpose of study was to investigate effects of carbon monoxide
exposure on these patients
 Prior to analyzing CO effects data, researchers wished to
compare the respiratory health of these patients across the
three medical centers
1
Pagano, M., Gauvreau, K. (2000). Principles of Biostatistics. Duxbury Press
Boxplots of Data
ANOVA Table
Source of
Variation
Sum of
Squares (SS)
Degrees of
Freedom (df)
Mean
Square
(MS)
F
Between
Groups
1.5828
2
0.791418 3.12
Within
Groups
14.48
57
0.254
Total
16.063
59
0.2723
P
0.052
ANOVA Table
Source of
Variation
Sum of
Squares (SS)
Degrees of
Freedom (df)
Mean
Square
(MS)
F
Between
Groups
1.5828
2
0.791418 3.12
Within
Groups
14.48
57
0.254
Total
16.063
59
0.2723
P
0.052
Simple Linear Regression
The Equation of a Line
 Recall (from Algebra) that there are two values which uniquely
define any line
 Y-intercept—where the line crosses the y-axis (when x = 0)
 Slope—the “rise over run”—how much y changes for every unit
change in x
 The equation of a line is given by: y = mx + b
 where m is the slope and b is the y-intercept
The Equation of a Line
 Statisticians have their own notation:
 y = b0 + b1x
 b0 = y-intercept
 b1 = slope
 y = β0 + β1x
 β 0 = y-intercept
 β 1 = slope
The Intercept, β0
 The intercept, β0, is the value of y when x = 0
 It is the point on the graph where the line crosses the y axis at
the coordinate (0, β0)
The Slope, β1
 The slope, β1, is the change in y corresponding to a unit increase in
x
The Slope, β1
 The slope, β1, is the change in y corresponding to a unit increase in
x
The Slope, β1
 This change is the same across the entire line
The Slope, β1
 All information about the difference in the y-value for two
differing values of x is contained in the slope
 For example: two values of x three units apart will have a
difference in y values of 3(β1)
The Slope, β1
 For example: two values of x three units apart will have a
difference in y values of 3(β1)
The Slope, β1
 For example: two values of x three units apart will have a
difference in y values of 3(β1)
The Slope, β1
 The slope is the change in y corresponding to a unit increase in x:
it is the difference in y-values for x + 1 compared to x
 If β1 = 0, this indicates that there is no association between x
and y (i.e., the values of y are the same regardless of the
values of x)
 If β1 > 0, this indicates that there is a positive association
between x and y (i.e., the values of y increase with increasing
values of x)
 If β1 < 0, this indicates that there is a negative association
between x and y (i.e., the values of y decrease with increasing
values of x)
The Slope, β1
The Equation of a Line
 In linear regression, points don’t fit exactly to a line
y = 2x + 1
25
20
15
10
5
0
0
2
4
6
8
10
The Equation of a Line
 In linear regression, points don’t fit exactly to a line
y = 2x + 1 + error
25
20
15
10
5
0
0
2
4
6
8
10
The Equation of a Line
 In linear regression, points don’t fit exactly to a line
y = 2x + 1 + more error
30
25
20
15
10
5
0
-5 0
-10
-15
2
4
6
8
10
Linear Regression
 Deterministic Model: model for an exact relationship between
variables (y = Ax)
 For example: Inches = 2.54·Centimeters
 Probabilistic Model: model that accounts for unexplained variation
in the relationship between two or more variables
 General Form: y = [Deterministic component] + [Random error]
 We estimate a line that relates the mean of an outcome y to a
predictor x
ˆ0  
ˆ0 x
E y   
ˆ0 ,
ˆ0 are
where E[y] is the expected (mean) value of y and 
estimated y-intercept and slope, respectively
The Equation of a Line
ˆ0 ,
ˆ0 are estimated using the data
 
 The resulting estimated line is the one that “best fits the data”
Example: Arm Circumference and Height
 Data on anthropomorphic measures from a random sample of 150
Nepali children up to 12 months old
 What is the relationship between average arm circumference and
height?
 Data:
 Arm circumference:
x  12.4cm; s  1.5cm;min  7.3cm;max  15.6cm
 Height:
y  61.6cm; s  6.3cm;min  40.9cm;max  73.3cm
Approach 1: t-Test
 Dichotomize height at median, compare mean arm circumference
with t-test and 95% CI
Approach 1: t-Test
 Potential advantages
 Gives a single summary measure for quantifying the arm
circumference/height association (a sample mean difference)
 Potential disadvantages
 Throws away a lot of valuable information in the height data
that was originally continuous
 Only allows for a single comparison between two crudely (and
arbitrarily?) defined height categories
Approach 2: ANOVA
 Categorize height into four categories by quartile, compare mean
arm circumferences with ANOVA and 95% CIs
Approach 2: ANOVA
 Potential advantages:
 Uses a less crude categorization of height than the previous
example
 Potential disadvantages:
 Still throws away a lot of information in the height data that
was originally measured as continuous
 Requires multiple summary measures to quantify arm
circumference/height relationship
 Does not exploit the structure we see in the previous boxplot-as height increases so does arm circumference
Approach 3: Linear Regression
 Treat height as continuous when estimating the relationship
 Linear regression is a potential option--it allows us to associate a
continuous outcome with a continuous predictor via a line
 The line estimates the mean value of the outcome for each
continuous value of height in the sample used
 Makes a lot of sense, but only if a line reasonably describes the
relationship
Visualizing the Relationship
 Scatterplot
Visualizing the Relationship
 Does a line reasonably describe the general shape of the
relationship?
 We can estimate a line using a statistical software package
 The line we estimate will be of the form:
yˆ  0  1x
 Here, yˆ is the average arm circumference for a group of children all
of the same height, x
Arm Circumference and Height
yˆ  2.7  0.16 x
 Here, yˆ is the estimated average arm circumference, x = height,
ˆ0  2.7 and 
ˆ1  0.16

 This is the estimated line from the sample of 150 Nepali children
Arm Circumference and Height
Arm Circumference and Height
Arm Circumference and Height
Arm Circumference and Height
 How do we interpret the estimated slope?
 The average change in arm circumference for a one-unit (1 cm)
increase in height
 The mean difference in arm circumference for two groups of
children who differ by one unit (1 cm) in height
 These results estimate that the mean difference in arm
circumferences for a one centimeter difference in height is 0.16
cm, with taller children having greater average arm circumference
Arm Circumference and Height
 This mean difference estimate is constant across the entire height
range in the sample
Arm Circumference and Height
 What is the estimated mean difference in arm circumference for:
 60 versus 59 cm?
 25 versus 24 cm?
 72 versus 71 cm?
 Answer: 0.16 cm
Arm Circumference and Height
 What is the estimated mean difference in arm circumference for
children 60 cm versus 50 cm tall?
Arm Circumference and Height
 What is the estimated mean difference in arm circumference for:
 90 versus 89 cm?
 34 versus 33 cm?
 110 versus 109 cm?
 Answer: We don’t know!
Arm Circumference and Height
 Our regression results only apply to the range of observed data
Arm Circumference and Height
 How do we interpret the estimated intercept?
 The estimated y when x = 0--the estimated mean arm
circumference for children 0 cm tall
 Does this make sense given our sample?
 Frequently, the scientific interpretation of the intercept is
meaningless
 It is necessary for fully specifying the equation of a line
Arm Circumference and Height
 X = 0 isn’t even on the graph
Notes
 Linear regression performed with a single predictor (one x) is
called simple linear regression
 Linear regression with more than one predictor is called multiple
linear regression
Example: Arm Circumference and Gender
 Data on anthropomorphic measures from a random sample of 150
Nepali children up to 12 months old
 What is the relationship between average arm circumference and
sex of a child?
Visualizing the Relationship
 Scatterplot Display
Visualizing the Relationship
 Boxplot display
Arm Circumference and Gender
 Here, y is arm circumference (continuous) and x is gender (binary)
 How do we handle gender as a predictor in regression?
 One possibility is to let x = 0 for male children and x = 1 for
female children
 How would we interpret the regression coefficients?
yˆ  0  1x
Arm Circumference and Gender
 The resulting equation is yˆ  12.5 - 0.13x
ˆ1  -0.13 --the estimated mean difference in arm circumference
 
for female children compared to male children is -0.13 cm; female
children have lower arm circumference by 0.13 cm on average
ˆ0  12.5 --the mean arm circumference for male children is 12.5
 
cm
Visualizing the Relationship
Estimating the Regression Equation
 How do we estimate the regression equation?
 There must be some algorithm that will always yield the same
results for the same data
Estimating the Regression Equation
 The algorithm to estimate the equation of the line is called the
least squares estimation
 The idea is to find the line that gets closest to all of the points in
the sample
 How do we define “closeness” to multiple points?
 In regression, it is the cumulative squared distance between
each point’s y-value and the corresponding value of yˆ for x
Estimating the Regression Equation
 Each distance is computed for each data point in the sample
Estimating the Regression Equation
ˆ0 ,
ˆ1 are the values that minimize the
 The values chosen for 
cumulative distances squared:
 
n
ˆ0  
ˆ1xi
min  yi - 
 i 1

2



Estimating the Regression Equation
 The values are just estimates based on a single sample
 If you were to have a different random sample of 150 Nepal
children from the same population of < 12 month olds, the
resulting estimate would likely be different (i.e., the values
that minimized the cumulative squared distance from this
second sample of points would likely different)
 As such, all regression coefficients have an associated standard
error that can be used to make statements about the true
relationship between mean y and x based on a single sample
Arm Circumference and Height
 The estimated regression equation relating arm circumference to
height using a random samples of 150 Nepali children less than 12
months old was
yˆ  2.7  0.16 x
 
ˆ   0.88
SE  

ˆ1  0.16 SE 
ˆ1  0.014

ˆ0  2.70


0
Arm Circumference and Height
 The random sampling behavior of estimated regression coefficients
is approximately normal for large samples, centered at the true,
unknown population values
 We can use the same ideas to create 95% CIs and get p-values
Arm Circumference and Height
 The estimated regression equation relating arm circumference to
height using a random sample of 150 Nepali children < 12 months
old was
yˆ  2.7  0.16 x

 
ˆ1  0.16 SE 
ˆ1  0.014

 The 95% CI for β1

 
ˆ1  1.96 SE 
ˆ1  0.16  1.96  0.014    0.13,0.19 

Arm Circumference and Height
 P-value for testing the hypotheses:
 HO: β1 = 0
 HA: β1 ≠ 0
 Assume the null is true and calculate the standardized “distance”
ˆ1 from 0
of 
t
ˆ1 - 0


 
ˆ1
SE 

ˆ1


 
ˆ1
SE 

0.16
 11.4
0.014
 The p-value is the probability of being 11.4 or more standard
errors away from a mean of 0 on a normal curve
 P < 0.001
Summarizing Findings: Circumference/Height
 This research used simple linear regression to estimate the
magnitude of the association between arm circumference and
height in Nepali children less than 12 months old, using data on a
random sample of 150
 A statistically significant positive association was found (p < 0.001)
 The results estimate that two groups of such children who differ by
1 cm in height will differ on average by 0.16 cm in arm
circumference (95% CI 0.13 cm to 0.19 cm)
In Excel
 “SLOPE” returns the estimate of the slope
In Excel
 “INTERCEPT” returns the estimate of the intercept
Arm Circumference and Height
 Estimate and 95% CI for the mean difference in arm circumference
for children 60 cm tall compared to children 50 cm tall
60 - 50  ˆ1  10 1.6   1.6cm
 What about the standard error?




 
ˆ1  10 SE 
ˆ1  10  0.014   0.14
SE 10
 95% CI:
 


ˆ1  1.96 SE 10
ˆ1
10 
1.6  1.96  0.14 

Notes
 For smaller samples, a slight change analogous to what we did with
means is required
 The sampling distribution of the regression coefficients is a
Student’s t distribution with n - 2 degrees of freedom, and
approaches the standard normal distribution as the size of the
sample increases
 95% CI for β1:

 
ˆ1  t.95,n-2 SE 
ˆ1

Arm Circumference Data Modified
 P-value for testing the hypotheses:
 HO: β1 = 0
 HA: β1 ≠ 0
 Suppose instead of 150 children, we have sampled only 21
 Assume the null is true and calculate the standardized “distance”
ˆ1 from 0
of 
t
ˆ1 - 0


 
ˆ1
SE 

ˆ1


 
ˆ1
SE 

0.16
 11.4
0.014
 The p-value is the probability of being 11.4 or more standard
errors away from a mean of 0 on a t(19) distribution
 P < 0.001
Intercept?
 All the previous examples have confidence intervals for the slope
(or multiples of the slope)
 We can also create CI/p-values for the intercept in the same
manner
 However, many times the intercept is just a placeholder and does
not describe a useful quantity
Comparing Proportions Between Two Independent
Populations
In This Section
 CIs for difference in proportions between two independent
populations
 Large sample methods for comparing proportions
 Normal approximation method
 Chi-square test
 Fisher’s Exact Test
 Relative Risk
Comparing Two Proportions
 Pediatric AIDS Clinical Trial Group (ACTG) Protocol 076 Study
Group1
 Study Design
 “We conducted a randomized, double-blinded, placebocontrolled trial of the efficacy and safety of zidovudine (AZT)
in reducing the risk of maternal-infant HIV transmission”
 363 HIV infected pregnant women were randomized to AZT or
placebo
1Conner, E., et al. (1994). Reduction of maternal-infant transmission of human immunodeficiency virus
type 1 with zidovudine treatment, NEJM, 331:18
Comparing Two Proportions
 Results
 Of the 180 women randomized to AZT, 13 gave birth to children
who tested positive for HIV within 18 months of birth
 Of the 183 women randomized to placebo, 40 gave birth to
children who tested positive for HIV within 18 months of birth
Notes
 Random assignment of treatment
 Helps insure two groups are comparable
 Patient and physician could not request a particular treatment
 Double-blind
 Patient and physician did not know treatment assignment
Observed HIV Transmission Rates
 AZT
ˆAZT 
p
13
 0.07  7% 
180
 PLA
ˆPLA 
p
40
 0.22  22% 
183
Notes
 Is the difference significant or can it be explained by chance?
 CI on the difference in proportions? P-value?
Sampling Distribution of Difference in Sample
Proportions
 Since we have large samples, we can be assured that the sampling
distributions of the sample proportions in both groups are
approximately normal (CLT)
 It turns out the difference of quantities which are approximately
normally distributed are also normally distributed
Sampling Distribution of Difference in Sample
Proportions
 The sampling distribution of the difference of two sample
proportions (based on large samples) approximates a normal
distribution
 This distribution is centered at the true, unknown population
difference p1-p2
AZT Group
Placebo Group
AZT-PLA
95% CIs for Difference in Proportions
 General formula:
Best estimate ± multiplier*SE(best estimate)
 The best estimate of a population difference in sample proportions
is:
ˆ1 - p
ˆ2
p
ˆ1may represent the sample proportion of infants HIV
 Here, p
ˆ2 may represent the
positive for 180 infants in the AZT group and p
same for the 183 infants in the placebo group
95% CI: AZT Study
ˆ1 - p
ˆ2  multiplierSE  p
ˆ1 - p
ˆ2 
p
 Statisticians have developed formulas for the standard error of the
difference
 These formulas depend on both sample sizes and sample
proportions
SE  pˆ1 - pˆ2  

SE  pˆ1 - pˆ2  
p1 1 - p1 
n1
pˆ1 1 - pˆ1 
n1


p2 1 - p2 
n2
pˆ2 1 - pˆ2 
n2
HIV/AZT Study
 Recall the data:
Group

AZT
PLA
N
180
183
Sample
proportion
0.07
0.22
ˆ1 - p
ˆ2  
SE  p
0.07  0.93 
180

0.22  0.78 
183
 0.36
95% CI: AZT Study
 The 95% CI for the true difference in proportions between the AZT
group and PLA groups is:

-0.15  1.96 SE  pˆ1 - pˆ2 
-0.15  1.96  0.036 
 -0.22,0.08 
Summary
 Results
 The proportion of infants who tested positive for HIV within 18
months of birth was seven percent (95% CI 4 -12%) in the AZT
group and twenty-two percent in the placebo group (95% CI 16 28%)
 The study results estimate the absolute decrease in the
proportion of HIV positive infants born to HIV positive mothers
associated with AZT to be as low as 8% and as high as 22%
Two-sample z-test: Getting a p-value
 Are the proportions of infants contracting HIV within 18 months of
birth equivalent at the population level for those whose mothers
are treated with AZT versus untreated?
 HO: p1 = p2
 HA: p1 ≠ p2
 In other words, is the expected difference in proportions zero?
 HO: p1 - p2 = 0
 HA: p1 - p2 ≠ 0
Hypothesis Test to Compare Two Independent
Proportions
 Recipe:
 Assume HO is true
 Measure the distance of our sample result from pO (here, it’s 0)
 Compare the distance (test statistic) to the appropriate
distribution to get the p-value
observed difference - null difference
z
SE  observed difference 

ˆ1 - p
ˆ2
p
ˆ1 1 - p
ˆ1 
p
n1

ˆ2 1 - p
ˆ2 
p
n2
HIV/AZT Study
 Recall,
ˆ1 - p
ˆ2  -0.15
p

ˆ1 - p
ˆ2   0.036
SE  p
 So in this study,
z
-0.15
 -4.2
0.36
 This result was 4.2 SE below the null mean of 0
P-values
 Is a result 4.2 standard errors below 0 unusual?
 It depends on the distribution we’re dealing with
 The p-value is the probability of getting a test statistic as or more
extreme than what we observed (-4.2) by chance
 The p-value comes from the sampling distribution of the difference
in two sample proportions
 What is the sampling distribution of the difference in sample
proportions?
 If both groups are large, it is approximately normal
 It is centered at the true difference
 Under the null, this true difference is 0
HIV/AZT Study
 To compute a p-value, we need to compute the probability of
being 4.2 or more SEs away from 0 on a standard normal curve
HIV/AZT Study
 If we were to look this up in a normal table, we would find a very
small p (p < 0.001)
 This method is also essentially equivalent to the chi-square (χ2)
method
 Gives about the same answer
 Will discuss more detail later
Displaying 2X2 Data
 Data of this sort can be displayed using a contingency table
The Chi-Square Test
 Testing equality of two population proportions using data from two
samples
 HO: p1 = p2
 HA: p1 ≠ p2
 In other words, is the expected difference in proportions zero?
 HO: p1 - p2 = 0
 HA: p1 - p2 ≠ 0
 In the context of a 2X2 table, this is testing whether there is a
relationship between the row variable (HIV status) and the column
variable (treatment type)
Chi-Square Test
 Pearson’s Chi-Square Test (χ2) can easily be done by hand
 Works well for “big” sample sizes--it is an approximate method
 Gives essentially the same p as the z test for comparing two
proportions
 Unlike the z-test, it can be extended to compare proportions
between more than two independent groups in one test
The Chi-Square Method
 Looks at discrepancies between observed and expected cell counts
(under the null hypothesis) in a 2X2 table
 O = observed
 E = expected = (row total*column total)/grand total
 “Expected” refers to the values for the cell counts that would be
expected if the null hypothesis is true (the expected cell
proportions if the underlying population proportions are equal)
The Chi-Square Method
 Recipe . . .
 Start by assuming the null hypothesis is true
 Measure the distance of the sample result from the null value
 Compare the test statistic (distance) to the appropriate
distribution to get the p-value
 
2
O - E 
2
E
 The sampling distribution of this statistic when the null is true is a
chi-square distribution with one degree of freedom
Chi-Square (1)
Contingency Table
 The observed value for cell one is 13
 We have to calculate the expected count:
RC 53 180 
E

 26.3
N
363
Expected Values
 We could do the same for the other three cells:
 Now we must compute the ‘distance’ (test statistic), χ2
 
2
O - E 
E
2
Expected Values
2  
O - E 
2
E
13 - 26.3 


26.3
 15.6
2
40 - 26.7 


26.7
2
167 - 153.7 


153.7
2
143 - 156.3 


156.3
2
Sampling Distribution
 P = 0.0001
Extending Chi-Square Test
 The chi-square test can be extended to test for differences in
proportions across more than two independent populations
 Analogous to ANOVA with binary outcomes
Extending Chi-Square Test
 Example: Health care indicators by immigrant status1
1Huang,
Z, et al. (2006). Health status and health service access and use among children in US
immigrant families, Am Jorn PH 96:4.
Extending Chi-Square Test
Extending Chi-Square Test
Extending Chi-Square Test
Next Time
 More on Proportions
 Fisher’s Exact Test
 Measures of Association: risk difference, relative risk, odds
ratio
 Survival Analysis
 Study Design Considerations
References and Citations
Lectures modified from notes provided by John McGready and Johns
Hopkins Bloomberg School of Public Health accessible from the World
Wide Web: http://ocw.jhsph.edu/courses/introbiostats/schedule.cfm

similar documents