Report

Analysis of Clinical Trials with Multiple Outcomes Changchun Xie, PhD Assistant Professor of Biostatistics Division of Biostatistics and Bioinformatics Department of Environmental Health University of Cincinnati Phone: (513)558-0229 email: [email protected] Appetizer/Quiz: Case 1 • An investigator got a P-value=0.04 from one test of treatment effect with one outcome. To support his/her original hypothesis, he/she tested the treatment effect for another related outcome and got P-value=0.03. • Do we need to adjust for multiple testing? Case 2 • With the same data, the investigator has two hypotheses for the same two outcomes. He/she used the same two statistical tests and got the same P-values. • Do we need to adjust for multiple testing? Outline • Introduction: clinical trials • Multiple outcomes in clinical trials with different objectives • Multiple testing • Conclusion and discussion Introduction • Define target patient population by using inclusion and exclusion criteria– providing a homogeneous sample • The criteria help in reducing bias and variability and increase statistical power • The more criteria that are imposed, the smaller the target patient population will be---more homogeneous, but it may cause difficulties in patient recruitment and limitations in generalization of the findings of the study. Phases • • • • • Preclinical: testing drug in non-human subjects Phase I Phase II Phase III Phase IV • Phase I: Phase I trials are the first stage of testing in human subjects. Normally, a small group of 20–80 healthy volunteers will be recruited. This phase is designed to assess the safety, tolerability, pharmacokinetics, and pharmacodynamics of a drug. Phase I trials also normally include doseranging, also called dose escalation studies, so that the best and safest dose can be found and to discover the point at which a compound is too poisonous to administer. • Phase II: Once a dose or range of doses is determined, the next goal is to evaluate whether the drug has any biological activity or effect. Phase II trials are performed on larger groups (100-300) and are designed to assess how well the drug works, as well as to continue Phase I safety assessments in a larger group of volunteers and patients. • Phase III: Phase III studies are randomized controlled multicenter trials on large patient groups (300–3,000 or more depending upon the disease/medical condition studied) and are aimed at being the definitive assessment of how effective the drug is, in comparison with current 'gold standard' treatment. • Phase IV: Phase IV trial is also known as postmarketing surveillance Trial. It is designed to detect any rare or long-term adverse effects over a much larger patient population and longer time period than was possible during the Phase I-III clinical trials. Harmful effects discovered by Phase IV trials may result in a drug being no longer sold, or restricted to certain uses. Blinding • Although the concept of randomization is to prevent bias from a statistically sound assessment of the study drug. It does not guarantee that there will be no bias caused by subjective judgment in reporting evaluation data processing and statistical analysis due to the knowledge of the identity of the treatments • Blinding: no one involved with the trial knows what treatment was given to the trial participant. Data and Safety Monitoring Board (DSMB) The DSMB is a group (typically 3 to 7 members) who are independent of the company sponsoring the trial. At least one DSMB member will be a statistician. The DSMB will meet at predetermined intervals (three to six months typically) and review unblinded results. The DSMB has the power to recommend termination of the study based on the evaluation of these results. There are typically three reasons a DSMB might recommend termination of the study: safety concerns, outstanding benefit, and futility. Patient compliance • Medication compliance is the act of taking medication on schedule or taking medication as prescribed • intention-to-treat (ITT) analysis is based on the initial treatment assignment and not on the treatment eventually received. (for efficacy not safety) Multiple outcomes in clinical trials with different objectives • To obtain better knowledge of a treatment effect in a clinical trial, many medically related outcomes are often collected. All or None Approach • The primary objective is defined as the simultaneous improvement in multiple endpoints • For several disorders including migraine, Alzheimer’s disease and osteoarthritis, regulatory agencies have required a treatment to demonstrate statistically significant effect on all multiple endpoints, each at level α. • No multiplicity adjustment is necessary (high power?) • This approach is very conservative since it requires that all hypothesis must be rejected at level α. Global Approach • The primary objective is to show that treatment has an overall effect across the endpoints without necessarily a large significant effect on any one endpoint. • O’Brien (1984) simplified the problem by assuming a common standardized effect size for all endpoints. 0 : = 0 versus 1 : > 0 = + + , where = 1,2, = 1, … , , = 1, … , , ~(0,1) with corr( , ′′′ ) = ′ if = ′ and = ′ ; corr( , ′′′ ) =0 otherwise Composite Outcomes Composite Outcomes/Endpoints: combine multiple events as one event using the first time of any the component event as the new event time. For example, Death/myocardial infarction(MI)/Stroke, the new event time= min(death time, MI time, Stroke time) Why to use a composite endpoint • Decrease in sample size required to show effects (increase the event rate) • Assessment of the “net” effect of an intervention: (net benefit=benefit-harmful effect) • Avoid bias in the assessment of an effect in presence of competing risks: The possible of bias due to competing risks arises in situations in which the occurrence of an event decrease the probability of another event of interest occurring. Problems • A positive effect is found for a composite endpoint, but this effect is due mainly to a component of less clinical significance, whereas the effect of component of more clinical significance is null or even negative. • The biggest risk of using the composite endpoints is that they exaggerate the real benefit of the intervention Problems • If the decision to stop a trial early is based on the monitoring of a composite endpoint, particularly if this is driven by the least patient-important endpoint. Such an approach may lead to overestimation of the benefit and underestimation of the risk. Heterogeneity of components • Relative clinical significance • Size of effect • Frequency of events Examples At-least-one Approach • The primary objective is to detect at least one significant effect (the trial is declared positive if the treatment effect for at least one endpoint is significant) • Multiple endpoint problem becomes multiple testing problem Introduction to Multiple Testing not rejected rejected Total True H0 U V m0 True H1 T S m1 m-R R m Total Why do we need to adjust for multiple testing • Assume there are 200 independent true null hypotheses with significance level, α=0.05. • The probability of rejecting at least one null hypothesis is 1 − 1 − 200 = 0.99996. • The expected number of false significant tests is 10. Error Rate control • Family-wise Error Rate • FWER=P(V 1) • False Discovery Rate • FDR=E(V/R|R>0)P(R>0) • When m0=m, FDR is equivalent to FWER • When m0<m, FDR≤FWER. FDR is not suitable for multiple endpoint problem in clinical trials • FDR is suitable for testing a large number of hypotheses in exploratory studies, in which a less stringent error control is acceptable. But tests for multiple endpoints in clinical trial are generally confirmatory for drug approval. • Multiple endpoints in clinical trial might have logical restrictions and decision rules. FDR is not designed to handle such complex logical restrictions and decision rules. Bonferroni Correction • Adjusting individual testing significance level to be α/m • ---- does not require the tests are independent ---- can be conservative if tests are correlated ---- equally weighted tests Fixed Sequence (FS) • Tests each null hypothesis at the same without any adjustment in a pre-specified testing sequence and further testing stops when the null hypothesis in the testing sequence is not rejected ---- require the pre-specified testing sequence ---- if the first null hypothesis cannot be rejected, the second null hypothesis cannot be reject even the p-value is very small. More Methods • Weighted Bonferroni • Bonferroni Fix Sequence • Weighted Holm • WMTCc method is for multiple continuous correlated endpoints. Does it still keep its advantages when correlated binary endpoints are used? Survival Data • For continuous data or binary data, the correlation matrix can be directly estimated from the corresponding correlated endpoints • It is challenging to directly estimate the correlation matrix from the multiple endpoints in survival data since censoring is involved Accepted as a chapter by the book, Innovative Statistical Methods for Public Health Data Conclusion and discussion • No multiple testing adjustment is necessary if a) All or none approach b) Global approach c) Composite outcome approach • Multiple testing adjustment is needed if At-least-one Approach • Use FWER control instead of FDR • Considering correlation among multiple endpoints might increase study power Thanks