Basic Concepts of Logistic Regression

Report
LOGISTIC REGRESSION
OUTLINE
•
Basic Concepts of Logistic Regression
•
Finding Logistic Regression Coefficients using Excel’s Solver
•
Significance Testing of the Logistic Regression Coefficients
•
Testing the Fit of the Logistic Regression Model
•
Finding Logistic Regression Coefficients using Newton’s Method
•
Comparing Logistic Regression Models
•
Hosmer-Lemeshow Test
BASIC CONCEPTS OF LOGISTIC REGRESSION
•
The basic approach is to use the following regression model, employing the notation from
Definition 3 of Method of Least Squares for Multiple Regression:
   =  +   +   + ⋯ +   + 
where the odds function is as given in the following definition.
Definition 1 :
•
Odds(E) is the odds that event E occurs, namely
 
 
  =
=
 ′
− 
Where p has a value 0 ≤ p ≤ 1 (i.e. p is a probability value), we can define the odds
function as
  =
•

−
For our purposes, the odds function has the advantage of transforming the probability
function, which has values from 0 to 1, into an equivalent function with values between 0
and ∞. When we take the natural log of the odds function, we get a range of values from
- ∞ to ∞.
Definition 2 :
•
The logit function is the log of the odds function, namely   = ln   , or
  =    = 

=   − ( − )
−
Definition 3 :
•
Based on the logistic model as described above, we have
  =  () =  +   +   + ⋯ +   + 
where p = P(E).
•
It now follows that
()
=   =   +  +  +⋯+  +
1 − ()
and so
  +  +  +⋯+  +
=  =
=
1 +   +  +  +⋯+  +
1 + −
1
=

1 +  −0 − =1  
•
1
 +  +  +⋯+  +
For our purposes we take the event E to be that the dependent variable y has value 1. If y
takes only the values 0 or 1, we can think of E as success and the complement E′ of E as
failure. This is as for the trials in a binomial distribution.
•
Just as for the regression model studied in Regression and Multiple Regression, a sample
consists of n data elements of the form (yi, xi1, xi2 ,…, xik), but for logistic regression each
yi only takes the value 0 or 1. Now let Ei = the event that yi = 1 and pi = P(Ei). Just as the
regression line studied previously provides a way to predict the value of the dependent
variable y from the values of the independent variables x 1, …, xk in for logistic regression
we have
1
= =1 =

1 +  −0 − =1  
( = 1)
  = ln
= 0 −
1 − ( = 1 )
•

=1
 
Note that since the yi have a proportion distribution, by Property 2 of Proportion
Distribution, var(yi) = pi (1 – pi).
•
1
In the case where k = 1, we have  = 1+ −0 −1  ,Such a curve has sigmoid shape:
The values of b0 and b1
determine the location direction
and spread of the curve. The
curve is symmetric about the

point where  = − 0 .
1
In fact, the value of p is 0.5 for
this value of x.
Sigmoid curve for p
•
Logistic regression is used instead of ordinary multiple regression because the
assumptions required for ordinary regression are not met. In particular
1. The assumption of the linear regression model that the values of y are normally
distributed cannot be met since y only takes the values 0 and 1.
2. The assumption of the linear regression model that the variance of y is constant
across values of x (homogeneity of variances) also cannot be met with a binary
variable. Since the variance is p(1–p) when 50 percent of the sample consists
of 1s, the variance is .25, its maximum value. As we move to more extreme
values, the variance decreases. When p = .10 or .90, the variance is (.1)(.9)
= .09, and so as p approaches 1 or 0, the variance approaches 0.
3. Using the linear regression model, the predicted values will become greater than
one and less than zero if you move far enough on the x-axis. Such values are
theoretically inadmissible for probabilities.
•
For the logistics model, the least squares approach to calculating the values of the
coefficients b i cannot be used; instead the maximum likelihood techniques, as described
below, are employed to find these values.
Definition 4:
•
The odds ratio between two data elements in the sample is defined as follows:
    =
( , ⋯  )
  ⋯ 
=
  +

=  
  +

=  
=

=  ( − )
Using the notation px = P(x), the log odds ratio of the estimates is defined as
(+  )
•
In the case where
 +  = 


+

= 
− 
 −   − 
 − +
 − 
=  +   +  −  −   = 
Thus,
( + )


=
=  
()
 −   − 
Furthermore, for any value of d
( + )
=  
()
Note that when x is a dichotomous variable,
  =
•
()
()
E.g.
when x = 0 for male and x = 1 for female, then  1 represents the odds ratio
between males and females. If for example b 1 = 2, and we are measuring the
probability of getting cancer under certain conditions, then  1 = 7.4, which would
mean that the odds of females getting cancer would be 7.4 times greater than
males under the same conditions.
•
The model we will use is based on the binomial distribution, namely the probability that
the sample data occurs as it does is given by


 (1 −  )1−
=
=1
Taking the natural log of both sides and simplifying we get the following definition.
Definition 5:
•
The log-likelihood statistic is defined as follows:

 = ln  =
 ln  + 1 −  ln 1 − 
=1
where the yi are the observed values while the p i are the corresponding theoretical
values.
•
Example 1:
A sample of 760 people who received doses of radiation between 0 and 1000 rems wasmade
following a recent nuclear accident. Of these 302 died as shown in the table in Figure 2.
Actually each row in the table represents the midpoint of an interval of 100 rems (i.e. 0-100,
100-200, etc.).
Figure 2.
•
<Solution>
Let Ei = the event that a person in the ith interval survived. The table also shows the
probability P(Ei) and odds Odds(Ei) of survival for a person in each interval. Note thatP(Ei) =
the percentage of people in interval i who survived and
In Figure 3 we plot the values of P(Ei) vs. i and Odds(Ei) vs. i. We see that the second of
these plots is reasonably linear.
Given that there is only one independent variable (namely x = # of rems), we can use the
following model
Here we use coefficients a and b instead of b0 and b1 just to keep the notation simple.
•
We show two different methods for finding the values of the coefficients a and b. The first
uses Excel’s Solver tool and the second uses Newton’s method. Before proceeding it
might be worthwhile to click on Goal Seeking and Solver to review how to use Excel’s
Solver tool and Newton’s Method to review how to apply Newton’s Method. We will use
both methods to maximize the value of the log-likelihood statistic as defined in Definition
5.
FINDING LOGISTIC REGRESSION COEFFICIENTS USING
EXCEL’S SOLVER
•
We now show how to find the coefficients for the logistic regression model using Excel’s
Solver capability (see also Goal Seeking and Solver). We start with Example 1 from Basic
Concepts of Logistic Regression.
Example 1 (continued) :
From Definition 1 of Basic Concepts of Logistic Regression, the predicted values p i for the
probability of survival for each interval i is given by the following formula where x i represents
the number of rems for interval i.
The log-likelihood statistic as defined in Definition 5 of Basic Concepts of Logistic Regression
is given by
where yi is the observed probability of survival in the ith interval. Since we are aggregating the
sample elements into intervals, we use the modified version of the formula, namely
yi is the observed probability of survival in the i th of r intervals where
We capture this information in the worksheet in Figure 1 (based on the data in Figure 2 of
Basic Concepts of Logistic Regression).
In figure 1,
Column I contains the rem values for each interval (copy of column A and E). Column J
contains the observed probability of survival for each interval (copy of column F). Column K
contains the values of each pi. E.g. cell K4 contains the formula =1/(1+EXP(-O5–O6*I4)) and
initially has value 0.5 based on the initial guess of the coefficients a and b given in cells O5
and O6 (which we arbitrarily set to zero). Cell L14 contains the value of LL using the formula
=SUM(L4:L13); where L4 contains the formula =(B4+C4)*(J4*LN(K4)+(1-J4)*LN(1-K4)), and
similarly for the other cells in column L.
We now use Excel’s Solver tool by selecting Data > Analysis|Solver and filling in the dialog
box that appears as described in Figure 2 (see Goal Seeking and Solver for more details).
Our objective is to maximize the value
of LL (in cell L14) by changing the
coefficients (in cells O5 and O6). It is
important, however, to make sure that
the Make Unconstrained Variables
Non-Negative checkbox is not checked.
When we click on the Solve button we
get a message that Solver has
successfully found a solution, i.e. it has
found values fora and b which
maximize LL.
We elect to keep the solution found and Solver automatically updates the worksheet from
Figure 1 based on the values it found for a and b. The resulting worksheet is shown in Figure
3.
We see that a = 4.476711 and b = -0.00721. Thus the logistics regression model is given by
the formula
For example, the predicted probability of survival when exposed to 380 rems of radiation is
given by
Note that
Thus, the odds that a person exposed to 180 rems survives is 15.5% greater than a person
exposed to 200 rems.
•
Real Statistics Data Analysis Tool:
The Real Statistics Resource Pack provides the Logistic Regression supplemental data
analysis tool. This tool takes as input a range which lists the sample data followed the number
of occurrences of success and failure. E.g. for Example 1 this is the data in range A3:C13 of
Figure 1. For this problem there was only one independent variable (number of rems). If
additional independent variables are used then the input will contain additional columns, one
for each independent variable.
We show how to use this tool to create a spreadsheet similar to the one in Figure 3. First
press Ctrl-m to bring up the menu of Real Statistics supplemental data analysis tools and
choose the Logistic Regression option. This brings up the dialog box shown in Figure 4.
Now select A3:C13 as the Input Range
(see Figure 5) and since this data is in
summary form with column headings,
select the Summary data option for the
Input Format and check Headings
included with data. Next select the Solver
as the Analysis Type and keep the default
Alpha and Classification Cutoff values
of .05 and .5 respectively.
Finally press the OK button to obtain the output displayed in Figure 5.
This tool takes as input a range which lists the sample data followed the number of occurrences of
success and failure (this is considered to be the summary form). E.g. for Example 1 this is the data
in range A3:C13 of Figure 1 (repeated in Figure 5 in the same cells). For this problem there was
only one independent variable (number of rems). If additional independent variables are used then
the input will contain additional columns, one for each independent variable.
Note that the coefficients (range Q7:Q8) are set initially to zero and (cell M16) is calculated to
be -526.792 (exactly as in Figure 1). The output from the Logistic Regression data analysis
tool also contains many fields which will be explained later. As described in Figure 2, we can
now use Excel’s Solver tool to find the logistic regression coefficient. The result is shown in
Figure 6. We obtain the same values for the regression coefficients as we obtained previously
in Figure 3, but also all the other cells are updated with the correct values as well.
SIGNIFICANCE TESTING OF THE LOGISTIC REGRESSION
COEFFICIENTS
•
Definition 1: For any coefficient b the Wald statistic is given by the formula
•
For ordinary regression we can calculate a statistic t ~ T(dfRes) which can be used
to test the hypothesis that a coordinate b = 0. The Wald statistic is approximately normal
and so it can be used to test whether the coefficient b = 0 in logistic regression.
•
Since the Wald statistic is approximately normal, by Theorem 1 of Chi-Square
Distribution, Wald2 is approximately chi-square, and, in fact, Wald2 ~ χ2(df) where df = k –
k0 and k = the number of parameters (i.e. the number of coefficients) in the model (the full
model) and k0 = the number of parameters in a reduced model (esp. the baselinemodel
which doesn’t use any of the variables, only the intercept).
•
Property 1:
The covariance matrix S for the coefficient matrix B is given by the matrix formula
where X is the r × (k+1) design matrix (as described in Definition 3 of Least Squares Method
for Multiple Regression)
and V = [vij] is the r × r diagonal matrix whose diagonal elements are vii = ni pi (1–pi),
where ni = the number of observations in group i and pi = the probability of success predicted
by the model for elements in group i. Groups correspond to the rows of matrixX and consist of
the various combinations of values of the independent variables.
Note that S = (XTW)-1 where W is X with each element in the ith row of X multiplied by vii.
Observation : The standard errors of the logistic regression coefficients consist of the square
root of the entries on the diagonal of the covariance matrix in Property 1.
•
Example 1 (Coefficients):
We now turn our attention to the coefficient table given in range E18:L20 of Figure 6 of
Finding Logistic Regression Coefficients using Solver (repeated in Figure 1 below).
Figure 1 – Output from Logistic Regression tool
Using Property 1 we calculate the correlation matrix S (range V6:W7) for the coefficient matrix
B via the the formula
Actually, for computational reasons it is better to use the following equivalent array formula:
The formulas used to calculate the values for the Rems coefficient (row 20) are given in
Figure 2.
Note that Wald represents the Wald2 statistic and that lower and upper represent the 100-α/2
% confidence interval of exp(b). Since 1 = exp(0) is not in the confidence interval
(.991743, .993871), the Rem coefficient b is significantly different from 0 and should therefore
be retained in the model.
Observation:
The % Correction statistic (cell N16 of Figure 1) is another way to gauge the fit of the model
to the observed data. The statistic says that 76.8% of the observed cases are predicted
accurately by the model. This statistic is calculated as follows:
For any observed values of the independent variables, when the predicted value
of p is greater than or equal to .5 (viewed as predicting success) then the %
correct is equal to the value of the observed number of successes divided by the
total number of observations (for those values of the independent variables).
When p < .5 (viewed as predicting failure) then the % correct is equal to the value
of the observed number of successes divided by the total number of observations.
These values are weighted by the number of observations of that type and then
summed to provide the % correct statistic for all the data.
For example, for the case where Rem = 450, p-Pred = .774 (cell J10), which predicts success
(i.e. survived). Thus the % Correct for Rem = 450 is 85/108 = 78.7% (cell N10). The weighted
sum (found in cell N16) of all these cells is then calculated by the
formula =SUMPRODUCT(N6:N15,H6:H15)/H16.
TESTING THE FIT OF THE LOGISTIC REGRESSION
MODEL
•
For larger values of b, the standard error and the wald statistic become inflated, which
increases the probability that b is viewed as not making a significant contribution to the
model even when it does (i.E. A type II error).
To overcome this problem it is better to test on the basis of the log-likelihood statistic since
where df = k – k0 and where LL1 refers to the full log-likelihood model and LL0 refers to a
model with fewer coefficients (especially the model with only the intercept b0 and no other
coefficients). This is equivalent to
Observation:
For ordinary regression the coefficient of determination
Thus R2 measures the percentage of variance explained by the regression model. We need a
similar statistic for logistic regression. We define the following three pseudo-R2 statistics for
logistic regression.
Definition 1 :
The log-linear ratio R2 is defined as follows :
where LL1 refers to the full log-likelihood model and LL0 refers to a model with fewer
coefficients (especially the model with only the intercept b0 and no other coefficients).
Cox and Snell’s R2 is defined as
where n = the sample size.
Nagelkerke’s R2 is defined as
Observation I :
Since cannot achieve a value of 1, Nagelkerke’s R2 was developed to have properties more
similar to the R2 statistic used in ordinary regression.
Observation II :
The initial value L0 of L, i.e. where we only include the intercept value b0, is given by
where n0 = number of observations with value 0, n1 = number of observations with value 1
and n = n0 + n1.
As described above, the likelihood-ratio test statistic equals:
where L1 is the maximized value of the likelihood function for the full model L1, while L0 is the
maximized value of the likelihood function for the reduced model. The test statistic has chi square distribution with df = k1 – k0, i.e. the number of parameters in the full model minus the
number of parameters in the reduced model.
•
Example 1 :
Determine whether there is a significant difference in survival rate between the different
values of rem in Example 1 of Basic Concepts of Logistic Regression. Also calculate the
various pseudo-R2 statistics.
We are essentially comparing the logistic regression model with coefficient b to that of the
model without coefficient b. We begin by calculating the L1 (the full model with b) and L0 (the
reduced model without b).
Here L1 is found in cell M16 or T6 of Figure 6 of Finding Logistic Coefficients using Solver.
We now use the following test :
where df = 1. Since p-value = CHITEST(280.246,1) = 6.7E-63 < .05 = α, we conclude that
differences in rems yield a significant difference in survival.
The pseudo-R2 statistics are as follows:
All these values are reported by the Logistic Regression data analysis tool (see range S5:T16
of Figure 6 of Finding Logistic Coefficients using Solver).
FINDING LOGISTIC REGRESSION COEFFICIENTS
USING NEWTON’S METHOD
•
Property 1:
The maximum of the log-likelihood statistic (from Definition 5 of Basic Concepts of Logistic
Regression) occurs when
Observation:
Thus, to find the values of the coordinates bi we need to solve the equations
We can do this iteratively using Newton’s method (see Definition 2 of Newton’s Methodand
Property 2 of Newton’s Method) as described in Property 2.
•
Property 2:
Let B = [bj] be the (k+1) × 1 column vector of logistic regression coefficients, let Y = [yi] be
the n × 1 column vector of observed outcomes of the dependent variable, let X be
the n × (k+1) design matrix (see Definition 3 of Least Squares Method for Multiple
Regression), let P = [pi] be the n × 1 column vector of predicted values of success and V = [vi]
be the n × n matrix where vi = pi (1 – pi). Then if B0 is an initial guess of B and for all m we
define the following iteration
then for m sufficiently large B ≈ Bm, and so Bm is a reasonable estimate of the coefficient
vector.
Observation:
If we group the data as we did in Example 1 of Basic Concepts of Logistic Regression (i.e.
summary data), then Property 3 holds where holds where Y = [yi] is the r × 1 column vector of
summarized observed outcomes of the dependent variable, X is the corresponding r × (k+1)
design matrix, P = [pi] is the r × 1 column vector of predicted values of success and V = [vi] is
the r × r matrix where vi = ni pi (1 – pi).
Example 1 (using Newton’s Method) :
We now return to the problem of finding the coefficients a and b for Example 1 of Basic
Concepts of Logistic Regression using the Newton’s Method.
We apply Newton’s method to find the coefficients as described in Figure 1. The method
converges in only 4 iterations with the values a = 4.47665 and b = -0.0072.
The regression equation is therefore logit(p) = 4.47665 – 0.0072x.
Example 2:
A study was made as to whether environmental temperature or immersion in water of the
hatching egg had an effect on the gender of a particular type of small reptile. The table in
Figure 2 shows the temperature (in degrees Celsius) and immersion in water (0 = no and 1 =
yes) of the 49 eggs which resulted in a live birth as well as the sex of the reptile that hatched.
Determine the odds that a female will be born if the temperature is 23 degrees with the egg
immersed in water vs. not immersed in water.
We use the Logistic Regression supplemental data analysis tool, selecting the Raw
data and Newton Method options as shown in Figure 3.
After pressing the OK button we obtain the output displayed in Figure 4.
Here we only show the first 19 elements in the sample, although the full sample is contained
in range A4:C52. Note that in the raw data option the Input Range (range A4:C52) consists of
one column for each independent variable (Temp and Water for this example) and a final
column only containing the values 0 or 1, where 1 indicates “success” (Male in this case) and
0 indicates “failure” (Female in this case). Please don’t read any gender discrimination into
these choices: we would get the same result if we chose Female to be success and Male to
be failure.
The model indicates that to predict the probability that a reptile will be male you can use the
following formula:
We can now obtain the desired results as shown in Figure 5 by copying any formula for p -Pred
from Figure 4 and making a minor modification.
Here we copied the formula from cell K6 into cells G29 and G30.
The formula that now appears in cell G29 will be =1/(1+EXP(-$R$7MMULT(A29:B29,$R$8:$R$9))). You just need to change the part A29:B29 to E29:F29 (where
the values of Temp and Water actually appear). The resulting formula
1/(1+EXP(-$R$7-MMULT(E29:F29,$R$8:$R$9)))
will give the result shown in Figure 5.
COMPARING LOGISTIC REGRESSION MODELS
•
Example 1:
Repeat the study from Example 3 of Finding Logistic Regression Coefficients using Newton’s
Method based on the summary data shown in Figure 1.
•
Using the Logistic Regression supplemental data analysis tool, selecting the Newton
Method option, we obtain the output displayed in Figure 2.
•
Example 2:
Do the Temp and Water variables make a significant difference in the model of Example 1?
We first create summary tables for the Temp-only and Water-only models and then use the
Logistic Regression data analysis tool (with Newton option) to build the two models. Also see
below for a simpler approach for creating the Temp-only summary table.
The summary table for the Temp model is shown in range B28:D34 of Figure 3 The values of
the C and D columns can be calculated from the summary table of the base model (as shown
in Figure 2) using SUMIF. For example, the number of samples where Temp = 20 and the
reptile was born Male (cell C29) is given by the formula
=SUMIF($A$4:$A$15,$B29,C$4:C$15)
By filling right (Ctrl-R) and down (Ctrl-D), you can copy this formula into the other cells in the
range C29:D34. You now use the Logistic Regression tool to obtain the output shown in Figure
3.
We observe that the Temp variable makes a significant contribution (cell U35) over the
constant-only model. Here we are comparing (Temp model) with (constant-only model).
We can also compare the Temp model with the base model (Temp + Water), by copying the
range T28:U35 to another location in the worksheet and using the value from the base model
and substituting the value from the Temp model for . Also we need to change to 1 since the
difference between the of the two models is 2 – 1 = 1. This is shown in Figure 4.
We see that there is not a
significant difference between
the models (cell X44). This
confirms the conclusion that
we reached previously that the
Water variable is not making a
significant contribution, and in
fact it can be dropped.
We create the Water-only model in a similar way to obtain the output shown in Figure 5.
This time we see that there is no significant difference between the Water model and the
constant model. If we repeat the analysis of Figure 4, we would see that there is a significant
difference between the Water model and the base model.
Finally, we can look at further refinements of the model, such as the full interaction model,
where we include the interaction between Temp and Water. We show this analysis in Figure 6.
If we compare this model with the base model using the approach described above (as in
Figure 4), we get the output shown in Figure 7.
This shows that there is a significant difference between the full interaction model and the
base model, with the interaction model providing a better fit.
Observation :
As mentioned above, there is a simpler way to create the Temp-only and Water-only summary
data tables. To create the Temp-only table, enter Ctrl-m and select the Logistic
Regression data analysis tool and then enter the following information into the dialog box that
appears
Here we have entered the Water independent
variable into the List of variables to
exclude field. This produces the output in
Figure 3.
Observation :
The List of variables to exclude field can be used whenever the Input Format is set
to Summary data and the Headings included with data field is checked in order to create a
reduced model. The list of variables to exclude are entered into this field separated by
commas.
E.g. if we have a summary data table with Nationality, Age, Education, Gender and
Occupation as independent variables and want to create a reduced model with only
Nationality, Education and Occupation, we would simply enter Age, Gender into theList of
variables to exclude field.
HOSMER-LEMESHOW TEST
•
The Hosmer-Lemeshow test is used to determine the goodness of fit of the logistic
regression model. Essentially it is a chi-square goodness of fit test (as described in
Goodness of Fit) for grouped data, usually where the data is divided into 10 equal
subgroups. The version of the test we present here uses the groupings that we have used
elsewhere and not subgroups of size ten.
•
Since this is a chi-square goodness of fit test, we need to calculate the HL statistic
where g = the number of groups. The test used is chi-square with g – 2
degrees of freedom. A significant test indicates that the model is not a
good fit and a non-significant test indicates a good fit.
•
Example 1:
Use the Hosmer-Lemeshow test to determine whether the logistic regression model is a good
fit for the data in Example 1 in Comparing Logistic Regression Models.
In our example the sum is taken over the 12 Male groups and the 12 Female groups. The
observed values are given in columns H and I (duplicates of the input data columns C and D),
while the expected values are given in columns L and M. E.g. cell L4 contains the formula
=K4*J4 and cell M4 contains the formula =J4-L4 or equivalently =(1-K4)*J4.
The HL statistic is calculated in cell N16 via the formula =SUM(N4:N15). E.g. cell N4 contains
the formula =(H4-L4)^2/L4+(I4-M4)^2/M4.
The Hosmer-Lemeshow test results are shown in range Q12:Q16. The HL stat is 24.40567 (as
calculated in cell N16), df = g – 2 = 12 – 2 = 10 and p-value = CHIDIST(24.40567, 10)
= .006593 < .05 = α, and so the test is significant, which indicates that the model is not a good
fit.
Observation :
The Hosmer-Lemeshow test needs to be used with caution. It tends to be highly dependent on
the groupings chosen, i.e. one selection of groups can give a negative result while another will
give a positive result. Also when there are too few groups (5 or less) then usually the test will
show a model fit.
As a chi-square goodness of fit test, the expected values used should generally be at least 5.
In Example 1 the cells L9, L15, M4 and M10 all have values less than 5, with cells M4 and
M10 especially troubling with values less than 1. We now address the problems of cells M4
and M10.
We can eliminate the first of these by combining the first two rows, as shown in Figure 2. Here
p-Pred for the first row (cell K23) is calculated as a weighted average of the first two values
from Figure 1 using the formula =(J4*K4+J5*K5)/(J4+J5). In a similar manner we combine the
7th and 8th rows from Figure 20.23.
The revised version shows a non-significant result, indicating that the model is a good fit.
Observation :
The Real Statistics Logistic Regression data analysis tool automatically performs the HosmerLemeshow test. For Example 1 of Finding Logistic Regression Coefficients using Solver, we
can see from Figure 5 of Finding Logistic Regression Coefficients using Solver that the logistic
regression model is a good fit. For Example 1, Figure 2 of Comparing Logistic Regression
Models shows that the model is not a good fit, at least until we combine rows as we did
above.
END

similar documents