### Power and Non-Inferiority - Georgetown

```Power and Non-Inferiority
Richard L. Amdur, Ph.D.
Chief, Biostatistics & Data Management Core, DC VAMC
Assistant Professor, Depts. of Psychiatry & Surgery
Georgetown University Medical Center
Power and Non-Inferiority
in Clinical Trials
Richard L. Amdur, Ph.D.
Chief, Biostatistics & Data Management Core, DC VAMC
Assistant Professor, Depts. of Psychiatry & Surgery
Georgetown University Medical Center
If you can not reject the null
hypothesis of ‘no effect’, this
does not ‘prove’ there is no
effect
Why?
Frequency Distribution for One Variable
Score
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
9
10
10
11
11
11
12
12
12
12
13
13
13
13
13
14
14
14
14
15
15
15
16
16
16
17
17
18
18
19
20
21
6
Frequency Table
Score
Subject
Count
9
10
11
12
13
14
15
16
17
18
19
20
21
1
2
3
4
5
4
3
3
2
2
1
1
1
total
32
5
% of
total
0.50
1.00
1.50
2.00
2.50
2.00
1.50
1.50
1.00
1.00
0.50
0.50
0.50
4
Count
Subject
3
2
1
0
9
10
11
12
13
14
15
16
17
18
Score
sd
mean
19
20
21
Compare the outcomes of treatment vs. control groups
Control
Test Tx
12
number of subjects
10
8
6
4
2
0
10
20
30
40
50
60
70
80
90 100 110 120 130 140 150 160 170 180 190 200
^
CON mean
^
TX mean
Outcome Score (Level of Functioning)
Effect Size = (meanTX – meanCON) / SDCON
Mean difference=1
If SD=3, ES = 1/3= 0.33
Mean difference=2
If SD=3, ES = 2/3 = 0.67
Mean difference=3
If SD=3, ES = 3/3 = 1.0
Mean difference=4
If SD=3, ES = 4/3 = 1.33
Mean difference=4
SD = 3
ES = 1.33
Mean difference=4
SD = 1.94
ES = 2.1
Mean difference=4
SD = 1.1
ES = 3.6
Type-I and Type-II Errors
you observe that
TX & Placebo
are:
Different
The Same
In fact, TX & Placebo are:
Different
The Same
1- 
(Power)


1-
 the rate of false positives, Type I error rate
β the rate of false negatives, Type II error rate
Power = 1 – β, the rate of true positives
Plot of Score Distribution under the Null
and Alternative Hypotheses
Using 2-tailed independent-groups t-test with alpha=.05, and power = .80
H0
H1
p
t
N needed per group is 64
Plot of Score Distribution under the Null
and Alternative Hypotheses
Using 2-tailed independent-groups t-test with alpha=.05, and power = .95
p
t
N needed per group is 105
Power is reduced by:
• Measurement Error
- This will tend to ‘muddy’ the outcome scores, making tx effect harder to distinguish – i.e., it increases
the SD of both the CON & TX groups, reducing the ES.
• Intent-to-treat analysis
- If subjects drop out because they see no progress.
- S’s rarely drop out because they get cured early, but if they did, then completer-analysis would reduce
power.
• Low disease severity
- Less room for improvement
If you can not reject the null
hypothesis of ‘no effect’, this
does not ‘prove’ there is no
effect
Why?
Because your power to detect an effect
might have been low.
Faul, F., Erdfelder, E., Buchner, A., & Lang, A.-G. (2009).
Statistical power analyses using G*Power 3.1: Tests for correlation and
regression analyses. Behavior Research Methods, 41, 1149-1160.
Faul, F., Erdfelder, E., Lang, A.-G., & Buchner, A. (2007). G*Power 3:
A flexible statistical power analysis program for the social, behavioral,
and biomedical sciences. Behavior Research Methods, 39, 175-191.
Equivalence & Non-inferiority
trials
How do you show that a new treatment is
not inferior to a standard treatment?
Quality of the evidence base
There should be several 2-arm trials of
‘old’ tx vs. placebo, in order to get a range
of effect sizes, and response rates (% who
improve).
Ideally, there is at least one 3-arm doubleblind placebo-controlled comparison
(‘old’ tx vs. new tx vs. placebo)
New Tx Beats Placebo
• Effect size vs. placebo is clinically significant.
•Mean difference on the primary outcome is
statistically significant
• Response rate (% responders) is higher than
placebo [how much higher is determined by prior studies]
New Tx not substantially worse than
established tx
• New tx mean on primary outcome is closer to
the est. tx mean than the placebo mean.
• New tx is not significantly different from the
established tx.
• New tx responder rate is not much lower than
that of the established tx. [should be just within the range seen in prior
studies]
• Lower bound of the 95% confidence interval
for primary outcome falls above ∆.
How to select ∆
• It is lower than the range of outcome
differences seen in prior to studies of
established tx vs. placebo.
• The smallest value that could be considered a
clinically meaningful effect (vs. placebo).
• The mean difference that corresponds to a x%
difference in responder rates. [x is determined by prior studies of the
established tx vs. placebo]
Other Criteria
• Dosing & duration of each tx are within the
range of known efficacy.
• No confounds (despite randomization)
• Sample size provides adequate power to
detect a clinically significant difference.
• Subjects have moderate disease severity.
• ‘Per protocol’ set of subjects may be best
(most conservative).
Other Criteria
• Tx compliance should be similar in both
groups.
• Low measurement error. If this is an interview
or ratings, there is careful training & inter-rater
reliability testing. If using a survey, the test is
psychometrically sound.
_________________________________________________________________________________________________________________________
These threats all create bias in favor of finding
equivalence, unlike a superiority trail, where
they bias the study against finding an effect.
Summary
• Evidence base adequate (for established tx).
• New Tx beats placebo.
• New Tx not substantially worse than established
tx.
• Study design features do not bias the results
toward equivalence.
Hypothetical Example
Previous studies
Mean
placebo sx Mean Est. Tx
score
sx score
placebo sd
ES
Placebo %
responder
Est tx %
responder
study
mean diff
1
10
20
10
6.0
1.7
0.25
0.4
2
12
22
10
5.8
2.1
0.2
0.48
3
14
30
16
6.0
2.3
0.15
0.49
4
16
25
9
6.4
2.5
0.22
0.6
avg
13
24.25
11.25
6.05
2.14
20.5%
49.3%
Possible values for ∆
Effect size x % responders
Mean
Difference
v. placebo
% Treatment Responders
0.65
y = 0.2093x + 0.0442
0.6
0.55
0.5
0.45
0.4
0.35
0.3
1.2
1.4
1.6
1.8
2
Effect size
2.2
2.4
2.6
Mean sx
score
ES vs.
placebo
ES vs.
Est. tx
Estimated
New tx %
responder
8
16
1.33
-1.00
32.3%
9
15
1.50
-0.80
35.8%
10
14
1.67
-0.60
39.3%
If ∆ = 9, the lower bound of the 95% CI for the
new tx primary outcome score must be < 15
in order to claim non-inferiority. This is
equivalent to a 36% responder rate and ES of
1.5 vs. placebo and -0.8 vs. established tx.
Hypothetical Results
A) New tx mean = 16
B) New tx mean = 17
New tx is not inferior to EST. Tx
New tx is inferior to EST. tx
```