### Exact Logistic Regression

```Exact Logistic Regression
Larry Cook
Outline
• Review the logistic regression model
• Explore an example where model
assumptions fail
– Brief algebraic interlude
• Explore an example with a different issue
where logistic regression fails
• Computational considerations
• Example SAS code
Logistic Regression
• Model a binary outcome, Y, with one or
more predictors
– Success/failure
– Disease/not disease
• Model outcome in terms of the log odds
of a success
• log(odds of Yi) = a + bxi + e
Why Log Odds?
• Makes a binary outcome continuous
• Solves this problem
– Probability is constrained to [0,1]
– Odds are constrained to [0, ∞)
• Log odds are in (-∞, ∞)
• Exponentiating coefficients gives us
estimates of odds ratios
Example: Motor Vehicle
Crash Fatalities
• What are odds of being hospitalized or
killed in a motor vehicle crash for drivers
using safety restraints vs. those that are
not?
– Outcome: Hospitalized/killed or not
– Covariate: safety belt use
Hospital/Killed * Restraint Use
OR = 0.22, p-value < 0.001
Example: Motor Vehicle
Crash Fatalities
• What are odds of being hospitalized or
killed in a motor vehicle crash for drivers
using safety restraints vs. those that are
not?
– Outcome: Hospitalized/killed or not
– Covariate: safety belt use
gender, age, alcohol, rural area
Logistic Regression Output
Parameter
Intercept
Male
Restraint
Use
Alcohol
Night
Rural
Estimate
-0.261
Odds Ratio
P-value
< 0.001
-0.576
-1.430
0.56
0.24
< 0.001
< 0.001
1.065
0.194
0.135
2.90
1.21
1.14
< 0.001
0.011
<0.001
Assumptions
• Conditional probabilities follow a logistic
function of the independent variables
• Observations are independent
• Asymptotics
– Sample size is large enough
– Minimum of 50 to 100 observations
– 10 successes/failures per variable
Corneal Graft Rejections
• What if studying a rare disease?
• Data for eight kids in young age group
and eight in the older age group
• Hypothesis is that rejection is more likely
in older children
Graft Rejections
Young (< 4 y.o.)
(X = 0)
Older (> 4 y.o.)
(X = 1)
Total
No Rejection
(Y = 0)
7
2
9
Rejection
(Y = 1)
1
6
7
Total
8
8
16
OR = 21, p-value = 0.012,
100% of cell have expected counts < 5!!!
Fisher’s Exact Test p-value (2-sided) = 0.0406; (1-sided) = 0.0203
Let’s Tackle the Graft
Rejection Example as
Logistic Regression
Graft Rejections
Young (< 4 y.o.)
Older (> 4 y.o.)
No Rejection
7
2
9
Rejection
1
6
7
Total
8
8
16
Sample Size << 50!
Don’t have 10 success or 10 failures!
Total
Exact (Conditional)
Logistic Regression
• Rather than using the unconditional
logistic regression, we will condition on
nuisance parameters
• Use conditional maximum likelihood for
estimation and inference
Proceed with Caution
Logistic Model
Likelihood of a Sample
Sufficient Statistics
Conditioning
• If we are only trying to describe the
relationship between rejection and age, do
we care about the value of the intercept?
• Remove the intercept, a, out of the
likelihood by conditioning on its sufficient
statistic, t0 = Syi.
• Let S(to) = Set of all tables with Syi = t0 and
observed sample sizes
Conditional Likelihood
Estimation
Inference
End of Algebra
Back to Example
Graft Rejections
Young (< 4 y.o.)
(X = 0)
Older (> 4 y.o.)
(X = 1)
Total
No Rejection
(Y = 0)
7
2
9
Rejection
(Y = 1)
1
6
7
Total
8
8
16
Sufficient Statistics
t0 = Syi = # of rejections = 7
t1 = Sxiyi = 0*# of rejections in young + 1*# of rejections in old
= 0*1 + 1*6 = 6
Conditional Distribution
for Graft Rejection
• Need to calculate all possible tables that
have exactly 7 rejections
• Calculate how often each of the tables
occur
• Calculate CMLE
• Calculate how rare our table is to obtain
p-value
Reference Set
Yng_NR
Yng_R
Old_NR
Old_R
t0
t1
1
7
8
0
7
0
8
0.0007
2
6
7
1
7
1
224
0.0196
3
5
6
2
7
2
1,568
0.1371
4
4
5
3
7
3
3,920
0.3427
5
3
4
4
7
4
3,920
0.3427
6
2
3
5
7
5
1,568
0.1371
7
1
2
6
7
6
224
0.0196
8
0
1
7
7
7
8
0.007
11,440
1.000
7
Count
P[Table]
Estimate b and Find a p-value
t1
Count
P[Table]
0
8
0.0007
1
224
0.0196
2
1,568
0.1371
3
3,920
0.3427
4
3,920
0.3427
5
1,568
0.1371
6
224
0.0196
7
8
0.0007
Estimate and p-value
t1
Count
P[Table]
0
8
0.0007
1
224
0.0196
2
1,568
0.1371
3
3,920
0.3427
4
3,920
0.3427
5
1,568
0.1371
6
224
0.0196
7
8
0.0007
Confidence Interval
• Lower Bound, b• If t1 = t1,min
• Upper Bound, b+
• If t1 = t1,max
 b- = -∞
 b+ = ∞
• Otherwise
• Otherwise
 b- is the value of b
that produces an
upper p-value of a/2
 b+ is the value of b
that produces a lower
p-value of a/2
Final Stats for Graft Rejection
Example 2
PECARN C-Spine Study
Case Control Study
Control
Case
Total
Not Present
1,057
540
1,0597
Present
2
0
2
Any problems estimating the odds ratio?
Could exact logistic regression help?
Total
1,059
540
1,599
What sufficient statistics
are needed?
Not Present
(X = 0)
Present
(X = 1)
Total
Control
(Y = 0)
1,057
2
1,059
Case
(Y = 1)
540
0
540
1,597
2
1,599
Total
• Sy = 2
• Sxy = 0
Conditional Density
Case P Case NP
Ctrl P
Ctrl NP
t0
t1
Count
P[Table]
0
540
2
1,057
2
0
560,211
0.438
1
539
1
1,058
2
1
571,860
0.448
2
538
0
1,059
2
2
145,530
0.114
1,277,601
1.000
2
One-sided p-value = 0.438
Two-sided p-value = 2*0.438 = 0.876
95% confidence interval (-∞, 2.345)
Point estimate?
Median Unbiased Estimate
One More Example
Dose Response
Toxicology Experiment
• 400 mice randomized to one of four levels of a drug
• Drug administered to each animal
• Outcome is the number of deaths in each dose
level
0
1
2
3
Total
Lived
99
97
95
90
381
Died
1
3
5
10
19
Total
100
100
100
100
400
Sy = 19
Sxy = 3 + 10 + 30 = 43
Exact vs. Unconditional
•
•
•
•
•
Exact
Estimate = 0.710
SE = 0.246
OR = 2.03
CI = (1.26, 3.52)
p-value = 0.002
•
•
•
•
•
Unconditional
Estimate = 0.712
SE = 0.246
OR = 2.04
CI = (1.26, 3.30)
p-value = 0.004
Computational Issues
Counting All the Tables
• One of the main hurdles for conditional
logistic regression is counting all the tables
in the sample space
– Graft rejections – 11,440 possibilities
– PECARN C-Spine - 1,277,601
– Toxicology – 2.79 x 1033
• Obviously don’t want to generate tables
one at a time
Network Algorithm
• Graphical representation of the sample
space
• Nodes represent a partial sum of the
sufficient statistic
• Arcs have combinatorial weighting value
• One path through the graph represents a
table in the sample space
Example
X=1
X=2
X=3
X=4
Y=0
3
2
2
1
8
Y=1
0
1
1
2
4
Total
3
3
3
3
12
Sufficient Statistics
t0 = Syi = 4
t1 = Sxiyi = 1*0 + 2*1 + 3*1 + 4*2 = 13
Total
(0,0)
(1,0)
(2,0)
(1,1)
(2,1)
(3,1)
(1,2)
(2,2)
(3,2)
(1,3)
(2,3)
(3,3)
(2,4)
(3,4)
(4,4)
X=1
X=2
X=3
X=4
Total
Y=0
1
3
1
3
8
Y=1
2
0
2
0
4
(0,0)
(1,0)
(2,0)
(1,1)
(2,1)
(3,1)
(1,2)
(2,2)
(3,2)
(1,3)
(2,3)
(3,3)
(2,4)
(3,4)
(4,4)
X=1
X=2
X=3
X=4
Total
Y=0
3
2
2
1
8
Y=1
0
1
1
2
4
Network Representation
of the Sample Space
(0,0)
(1,0)
(2,0)
(1,1)
(2,1)
(3,1)
(1,2)
(2,2)
(3,2)
(1,3)
(2,3)
(3,3)
(2,4)
(3,4)
(4,4)
Multiple Covariates?
More Conditioning!
Osteogtenic Sarcoma
LogXact Manual
• 46 patients surgically treated for
osteogenic sarcoma and then observed
for disease recurrence within 3 years
• Covariates
– Sex: Male = 1, Female = 0
– Any Ostoid Pathology (AOP)
• Present = 1, not = 0
• Interested in the effect of AOP
Osteogtenic Sarcoma
Covariate
Group
No
Recurrence
(y = 0)
Recurrence
(y = 1)
Group
Size
(ni)
1
8
2
Covariates
Sex (x1)
AOP (x2)
0
8
0
0
5
2
7
0
1
3
9
4
13
1
0
4
7
11
18
1
1
Total
29
17
46
Estimating the Effect of AOP
• New statistics to condition
– Group sizes
– Sufficient statistic for intercept, Sy = 17
– Sufficient statistic for coefficient for sex, Sx1y = 15
• Calculate the conditional distribution of Sx2y
– Sufficient statistic for coefficient for AOP
– Number of cases with AOP in recurrence (=13)
– Given exactly 17 with recurrence
15 of which are males
Network Algorithm
• The Network Algorithm using two passes
– First pass conditions on the intercept
• All tables with exactly 17 cases in recurrence
– Second pass removes arcs that don’t
produce sufficient statistic for sex
• All tables that don’t have 15 males in recurrence
• Proceed with estimation & inference as
before
P[Sx2y = t2 |17 in recurrence
and 15 males ]
Results
LR Test for Both Variables
• To test both sex and AOP are zero
simultaneously, need the joint conditional
density
– All possible combinations of males and
patients with AOP in recurrence given
exactly 17 patients in recurrence
– Determine how rare is it to have 15 recurrent
males AND 13 recurrent AOP patients?
SAS Examples
Conclusion
• Exact (conditional) logistic regression
– Useful method when asymptotic assumptions
are not met or with separation
– Utilizes conditioning to remove nuisance
parameters from the likelihood
– Very computational intensive method
– Network algorithm speeds up calculations
Questions?
```