Applied Survival Analysis - University of Cincinnati

Analysis of Clinical Trials with Multiple
Changchun Xie, PhD
Assistant Professor of Biostatistics
Division of Biostatistics and Bioinformatics
Department of Environmental Health
University of Cincinnati
Phone: (513)558-0229
email: [email protected]
Appetizer/Quiz: Case 1
• An investigator got a P-value=0.04 from one
test of treatment effect with one outcome. To
support his/her original hypothesis, he/she
tested the treatment effect for another
related outcome and got P-value=0.03.
• Do we need to adjust for multiple testing?
Case 2
• With the same data, the investigator has two
hypotheses for the same two outcomes.
He/she used the same two statistical tests and
got the same P-values.
• Do we need to adjust for multiple testing?
• Introduction: clinical trials
• Multiple outcomes in clinical trials with different
• Multiple testing
• Conclusion and discussion
• Define target patient population by using
inclusion and exclusion criteria– providing a
homogeneous sample
• The criteria help in reducing bias and variability
and increase statistical power
• The more criteria that are imposed, the smaller
the target patient population will be---more
homogeneous, but it may cause difficulties in
patient recruitment and limitations in
generalization of the findings of the study.
Preclinical: testing drug in non-human subjects
Phase I
Phase II
Phase III
Phase IV
• Phase I:
Phase I trials are the first stage of testing in
human subjects. Normally, a small group of 20–80
healthy volunteers will be recruited. This phase is
designed to assess the safety, tolerability,
pharmacokinetics, and pharmacodynamics of a
drug. Phase I trials also normally include doseranging, also called dose escalation studies, so
that the best and safest dose can be found and to
discover the point at which a compound is too
poisonous to administer.
• Phase II:
Once a dose or range of doses is determined, the
next goal is to evaluate whether the drug has any
biological activity or effect. Phase II trials are
performed on larger groups (100-300) and are
designed to assess how well the drug works, as well
as to continue Phase I safety assessments in a larger
group of volunteers and patients.
• Phase III:
Phase III studies are randomized controlled
multicenter trials on large patient groups (300–3,000
or more depending upon the disease/medical
condition studied) and are aimed at being the
definitive assessment of how effective the drug is, in
comparison with current 'gold standard' treatment.
• Phase IV:
Phase IV trial is also known as postmarketing
surveillance Trial. It is designed to detect any
rare or long-term adverse effects over a much
larger patient population and longer time period
than was possible during the Phase I-III clinical
trials. Harmful effects discovered by Phase IV
trials may result in a drug being no longer sold,
or restricted to certain uses.
• Although the concept of randomization is to
prevent bias from a statistically sound
assessment of the study drug. It does not
guarantee that there will be no bias caused by
subjective judgment in reporting evaluation data
processing and statistical analysis due to the
knowledge of the identity of the treatments
• Blinding: no one involved with the trial knows
what treatment was given to the trial participant.
Data and Safety Monitoring Board (DSMB)
The DSMB is a group (typically 3 to 7 members) who are
independent of the company sponsoring the trial. At least one
DSMB member will be a statistician. The DSMB will meet at
predetermined intervals (three to six months typically) and
review unblinded results.
The DSMB has the power to recommend termination of the
study based on the evaluation of these results. There are
typically three reasons a DSMB might recommend termination
of the study: safety concerns, outstanding benefit, and futility.
Patient compliance
• Medication compliance is the act of taking
medication on schedule or taking medication
as prescribed
• intention-to-treat (ITT) analysis is based on the
initial treatment assignment and not on the
treatment eventually received. (for efficacy not
Multiple outcomes in clinical trials
with different objectives
• To obtain better knowledge of a treatment
effect in a clinical trial, many medically related
outcomes are often collected.
All or None Approach
• The primary objective is defined as the
simultaneous improvement in multiple endpoints
• For several disorders including migraine,
Alzheimer’s disease and osteoarthritis, regulatory
agencies have required a treatment to
demonstrate statistically significant effect on all
multiple endpoints, each at level α.
• No multiplicity adjustment is necessary (high
• This approach is very conservative since it
requires that all hypothesis must be rejected
at level α.
Global Approach
• The primary objective is to show that treatment has
an overall effect across the endpoints without
necessarily a large significant effect on any one
• O’Brien (1984) simplified the problem by assuming
a common standardized effect size for all endpoints.
0 :  = 0 versus 1 :  > 0
 =  +  +  , where  = 1,2,  =
1, … ,  ,  = 1, … , ,  ~(0,1) with corr( ,
′′′ ) = ′ if  =  ′ and  =  ′ ; corr( ,
′′′ ) =0 otherwise
Composite Outcomes
Composite Outcomes/Endpoints: combine multiple
events as one event using the first time of any the
component event as the new event time.
For example,
Death/myocardial infarction(MI)/Stroke,
the new event time= min(death time, MI time,
Stroke time)
Why to use a composite endpoint
• Decrease in sample size required to show
effects (increase the event rate)
• Assessment of the “net” effect of an
intervention: (net benefit=benefit-harmful
• Avoid bias in the assessment of an effect in
presence of competing risks: The possible of
bias due to competing risks arises in situations
in which the occurrence of an event decrease
the probability of another event of interest
• A positive effect is found for a composite
endpoint, but this effect is due mainly to a
component of less clinical significance,
whereas the effect of component of more
clinical significance is null or even negative.
• The biggest risk of using the composite
endpoints is that they exaggerate the real
benefit of the intervention
• If the decision to stop a trial early is based on
the monitoring of a composite endpoint,
particularly if this is driven by the least
patient-important endpoint. Such an approach
may lead to overestimation of the benefit and
underestimation of the risk.
Heterogeneity of components
• Relative clinical significance
• Size of effect
• Frequency of events
At-least-one Approach
• The primary objective is to detect at least one
significant effect (the trial is declared positive
if the treatment effect for at least one
endpoint is significant)
• Multiple endpoint problem becomes multiple
testing problem
Introduction to Multiple Testing
True H0
True H1
Why do we need to adjust for
multiple testing
• Assume there are 200 independent true null
hypotheses with significance level, α=0.05.
• The probability of rejecting at least one null
hypothesis is 1 − 1 −  200 = 0.99996.
• The expected number of false significant
tests is 10.
Error Rate control
• Family-wise Error Rate
• FWER=P(V  1)
• False Discovery Rate
• FDR=E(V/R|R>0)P(R>0)
• When m0=m, FDR is equivalent to FWER
• When m0<m, FDR≤FWER.
FDR is not suitable for multiple
endpoint problem in clinical trials
• FDR is suitable for testing a large number of
hypotheses in exploratory studies, in which a
less stringent error control is acceptable. But
tests for multiple endpoints in clinical trial are
generally confirmatory for drug approval.
• Multiple endpoints in clinical trial might have
logical restrictions and decision rules. FDR is
not designed to handle such complex logical
restrictions and decision rules.
Bonferroni Correction
• Adjusting individual testing significance level
to be α/m
• ---- does not require the tests are independent
---- can be conservative if tests are correlated
---- equally weighted tests
Fixed Sequence (FS)
• Tests each null hypothesis at the same  without
any adjustment in a pre-specified testing
sequence and further testing stops when the null
hypothesis in the testing sequence is not rejected
---- require the pre-specified testing sequence
---- if the first null hypothesis cannot be
rejected, the second null hypothesis cannot
be reject even the p-value is very small.
More Methods
• Weighted Bonferroni
• Bonferroni Fix Sequence
• Weighted Holm
• WMTCc method is for multiple continuous
correlated endpoints. Does it still keep its
advantages when correlated binary endpoints
are used?
Survival Data
• For continuous data or binary data, the
correlation matrix can be directly estimated
from the corresponding correlated endpoints
• It is challenging to directly estimate the
correlation matrix from the multiple
endpoints in survival data since censoring is
Accepted as a chapter by the book, Innovative Statistical Methods for Public Health Data
Conclusion and discussion
• No multiple testing adjustment is necessary if
a) All or none approach
b) Global approach
c) Composite outcome approach
• Multiple testing adjustment is needed if
At-least-one Approach
• Use FWER control instead of FDR
• Considering correlation among multiple
endpoints might increase study power

similar documents