### ch16LimitedDepVARS5NLOGIT

```ECON 6002
Econometrics
Memorial University of Newfoundland
Qualitative and Limited Dependent Variable
Models

Nested Logit

Mixed Logit AKA Random Parameters Logit

Generalized Multinomial Logit
Principles of Econometrics, 3rd Edition
Slide 16-2
IIA assumption

There is the implicit assumption in logit models that the odds between any pair of
alternatives is independent of irrelevant alternatives (IIA)
One way to state the assumption
 If choice A is preferred to choice B out of the choice set {A,B}, then
introducing a third alternative X, thus expanding that choice set to
{A,B,X}, must not make B preferable to A.

which kind of makes sense 
Principles of Econometrics, 3rd Edition
Slide16-3
IIA assumption

There is the implicit assumption in logit models that the odds between any pair of
alternatives is independent of irrelevant alternatives (IIA)
In the case of the multinomial logit model, the IIA implies that adding
another alternative or changing the characteristics of a third
alternative must not affect the relative odds between the two
alternatives considered.
This is not realistic for many real life applications involving similar
(substitute) alternatives.
Principles of Econometrics, 3rd Edition
Slide16-4
IIA assumption
This is not realistic for many real life applications with similar
(substitute) alternatives
Examples:
 Beethoven/Debussy versus another of Beethoven’s Symphonies
(Debreu 1960; Tversky 1972)
 Bicycle/Pony (Luce and Suppes 1965)
 Red Bus/Blue Bus (McFadden 1974).
 Black slacks, jeans, shorts versus blue slacks (Hoffman, 2004)
 Etc.
Principles of Econometrics, 3rd Edition
Slide16-5
IIA assumption

Imagine commuters first face a decision between two modes of transportation: car and red
bus

Suppose that a consumer chooses between these two options with equal probability, 0.5, so
that the odds ratio equals 1.

Now add a third mode, blue bus. Assuming bus commuters do not care about the color of the
bus (they are perfect substitutes), consumers are expected to choose between bus and car still
with equal probability, so the probability of car is still 0.5, while the probabilities of each of
the two bus types should go down to 0.25

However, this violates IIA: for the odds ratio between car and red bus to be preserved, the
new probabilities must be: car 0.33; red bus 0.33; blue bus 0.33

Te IIA axiom does not mix well with perfect substitutes 
IIA assumption
We can test this assumption with a Hausman-McFadden test which
compares a logistic model with all the choices with one with
restricted choices (mlogtest, hausman base in STATA, but check
option detail too: mlogtest, hausman detail)
However, see Cheng and Long (2007)
Another test is Small and Hsiao’s (1985)
STATA’s command is mlogtest, smhsiao (careful: the sample is
randomly split every time, so you must set the seed if you want to
See Long and Freese’s book for details and worked examples
IIA assumption

Extensions have arisen to deal with this issue

The multinomial probit and the mixed logit are alternative models for nominal outcomes that
relax IIA, by allowing correlation among the errors (to reflect similarity among options) but
these models often have issues and assumptions themselves 

IIA can also be relaxed by specifying a hierarchical model, ranking the choice alternatives.
The most popular of these is called the McFadden’s nested logit model, which allows
correlation among some errors, but not all (e.g. Heiss 2002)

Generalized extreme value and multinomial probit models possess another property, the
Invariant Proportion of Substitution (Steenburgh 2008), which itself also suggests similarly
counterintuitive real-life individual choice behavior

The multinomial probit has serious computational disadvantages too, since it involves
calculating multiple (one less than the number of categories) integrals. With integration by
simulation this problem is being ameliorated now…
IIA assumption

IIA can also be relaxed by specifying a
hierarchical model, ranking the choice
alternatives

The most popular of these is called the
McFadden’s nested logit model, which allows
correlation among some errors, but not all
(e.g. Heiss 2002)
IIA assumption

The nested logit is a partial relaxation of the IID and
IIA assumptions of the MNL model

It is relatively straightforward to estimate

It also has a closed-form solution
IIA assumption

Most NL models have only two hierarchical levels,
Very few NL models are estimated with three levels,
and even fewer with four levels

Note that the “tree structure” does not have an
actual “sequential” interpretation of any sort

It is only there to allow for differentials in the
degree of correlation within and between “nests”

By default, nowadays Stata’s nlogit uses a parameterization
that is consistent with RUM

Before Stata 10, a nonnormalized version of the nested logit
model was used by Stata (and other packages) and you will
see some papers pointing that out

This can still be requested by specifying the nonnormalized
option

nonnormalized requests a nonnormalized parameterization of the model that does not scale the
inclusive values by the degree of dissimilarity of the alternatives within each nest. Use this option to
replicate results from older versions of Stata (Stata help)

By default, NOW Stata’s nlogit uses a parameterization that is
consistent with RUM

Before Stata 10, a nonnormalized version of the nested logit
model was used by Stata (and other packages) and you will
see some papers pointing that out

Both versions are valid, but only the RUM-consistent version
is based on a sound model of consumer behavior

(the normalization is about scaling the coefﬁcients in the second level choice, dividing them by the
dissimilarity parameters, so that the utilities can be meaningfully compared, see Heiss (2002) for details)

Adapted from Stata’s help file, let us consider a
model of restaurant choice

use http://www.stata-press.com/data/r13/restaurant

Or look it up (it is one of Stata’s example datasets)

run describe
“Fake” data on 300 families and their choice of
seven local restaurants:
 Freebirds and Mama’s Pizza sell fast food
 Cafe Eccell, Los Nortenos, and Wings ’N More
are family restaurants
 Christopher’s and Mad Cows are fancy
restaurants
Model the decision of where to eat as a function
of:
 household income
 Number of kids
 rating, of the restaurant (coded 0–5)
 average meal cost per person
 distance between the household and the
restaurant
Note that:
 income and kids are attributes of the family
 rating is an attribute of the alternative (the
restaurant)
 cost and distance are attributes of the
alternative as perceived by the families—that
is, each family has its own cost and distance
for each restaurant.
Thus:
 income
 rating
 cost
and kids are case-specific
is alternative-specific
and distance are both
Why not only 300 obs.?
Why not only 300 obs.?

You could fit a conditional logit model to this data as
arranged

Since income and kids are case-specific, you would

*asclogit is great…in the “old days” you would need to work a bit harder with dummies and
interactions to e able to run a mixed model with the old clogit command. This is still a good
exercise tough.

You could fit a conditional logit model to this data as
arranged

However, the conditional logit may be inappropriate,
since it assumes that the random errors are
independent, and as a result it forces the odds ratio of
any two alternatives to be independent of the other
alternatives, the IIA!!!
. clogit chosen kids income cost rating distance , group(family_id)
note: kids omitted because of no within-group variance.
note: income omitted because of no within-group variance.
Iteration
Iteration
Iteration
Iteration
0:
1:
2:
3:
log
log
log
log
likelihood
likelihood
likelihood
likelihood
=
=
=
=
-538.52688
-537.06762
-537.06643
-537.06643
Conditional (fixed-effects) logistic regression
Log likelihood = -537.06643
chosen
Coef.
kids
income
cost
rating
distance
0
0
-.1543089
.8669793
-.085346
Std. Err.
(omitted)
(omitted)
.0173858
.0981221
.0437514
z
Number of obs
LR chi2(3)
Prob > chi2
Pseudo R2
P>|z|
=
=
=
=
2100
93.41
0.0000
0.0800
[95% Conf. Interval]
This is the pure conditional logit!
-8.88
8.84
-1.95
0.000
0.000
0.051
-.1883845
.6746635
-.1710973
-.1202333
1.059295
.0004052
. asclogit chosen cost dist, case(family_id) alternatives(restaurant) casevars(income kids)
Iteration
Iteration
Iteration
Iteration
Iteration
0:
1:
2:
3:
4:
log
log
log
log
log
likelihood
likelihood
likelihood
likelihood
likelihood
=
=
=
=
=
-487.34059
-483.15644
-482.21859
-482.21731
-482.21731
Alternative-specific conditional logit
Case variable: family_id
Number of obs
Number of cases
=
=
2100
300
Alternative variable: restaurant
Alts per case: min =
avg =
max =
7
7.0
7
Wald chi2(14)
Prob > chi2
Log likelihood = -482.21731
chosen
chosen
Coef.
Coef.
Std. Err.
z
Std. Err.
restaurant
cost
distance
-.1330786
.0675342
-1.97
restaurant
-.2127489
.0482651
-4.41
cost
-.1046302
.0656595
Freebirds
(base alternative)
distance
-.1922521
.0465579
P>|z|
0.049
0.000
z[95%
=
=
60.28
0.0000
P>|z|
Conf. Interval]
-.2654432
-.3073468
-1.59
-4.13
-.0007141
-.1181511
0.111
0.000
MamasPizza
income
kids
_cons
.0239198
.3248002
-1.228673
.0232255
.274624
1.108607
1.03
1.18
-1.11
0.303
0.237
0.268
-.0216013
-.2134529
-3.401503
.069441
.8630533
.944157
CafeEccell
income
kids
_cons
.0426504
.3486398
.3677203
.0193985
.226436
.9075505
2.20
1.54
0.41
0.028
0.124
0.685
.0046301
-.0951667
-1.411046
.0806708
.7924463
2.146486
LosNortenos
income
kids
_cons
.0325719
.1761988
1.483996
.0193533
.2257644
.9318191
1.68
0.78
1.59
0.092
0.435
0.111
-.0053599
-.2662912
-.3423353
.0705037
.6186889
3.310328
WingsNmore
income
kids
_cons
.0428583
.2442661
.6935955
.0194924
.2268803
.9268626
2.20
1.08
0.75
0.028
0.282
0.454
.0046538
-.2004111
-1.123022
.0810627
.6889434
2.510213
Christophers
income
kids
_cons
.0714976
-.0374476
1.626207
.021214
.2535199
1.672155
3.37
-0.15
0.97
0.001
0.883
0.331
.029919
-.5343376
-1.651157
.1130763
.4594424
4.903571
income
kids
_cons
.0963617
-.2062462
.8012166
.0221248
.2713623
1.821835
4.36
-0.76
0.44
0.000
0.447
0.660
.052998
-.7381066
-2.769514
.1397254
.3256142
4.371948
.
[95%
I could not estimate
ratingInterval]
at the same time
Conf.
It did not converge 
-.2333204
.02406
Could
you
run
-.2835039
-.1010003
a
Plain MNL
With this dataset?

Here we suspect that restaurants should be grouped
by type (fast, family, or fancy)

Why?

Assuming that “unobserved stuff” affecting a
decision about one alternative has no effect on the
choice other alternatives may seem innocuous, but
often this assumption is too restrictive

Example: when a family was deciding which
restaurant to visit, they were pressed for time because
of plans to attend a movie later

The unobserved shock (being in a hurry) would raise
the likelihood that of going to either fast food
restaurant (Freebirds or Mama’s Pizza)

Another family might be choosing a restaurant to
celebrate a birthday and therefore be inclined to
attend a fancy restaurant (Christopher’s or Mad
Cows)

With the nested logit,we are not assuming that
families first choose whether to attend a fast, family,
or fancy restaurant and then choose the particular
restaurant

We assume merely that they choose one of the seven
restaurants

We now must first create a variable that defines the
structure of our “decision tree”
nlogitgen type = restaurant(fast: Freebirds | MamasPizza, family: CafeEccell |
LosNortenos| WingsNmore, fancy: Christophers | MadCows)

We now must first create a variable that defines the
structure of our “decision tree”
Our new type variable defines the three types of
restaurants
 We can now see how the alternative-specific
attributes (cost, rating, and distance) apply to the
bottom alternative set (the seven restaurants)


and how family-specific attributes (income and kid)
apply to the alternative set at the first decision level
(the three types of restaurants)

nlogit chosen cost rating distance || type: income
kids, base(family) || restaurant:, noconstant
case(family_id)
RUM-consistent nested logit regression
Case variable: family_id
Number of obs
Number of cases
=
=
2100
300
Alternative variable: restaurant
Alts per case: min =
avg =
max =
7
7.0
7
Wald chi2(7)
Prob > chi2
Log likelihood = -485.47331

Std. Err.
z
P>|z|
=
=
46.71
0.0000
chosen
Coef.
[95% Conf. Interval]
restaurant
cost
rating
distance
-.1843847
.463694
-.3797474
.0933975
.3264935
.1003828
-1.97
1.42
-3.78
0.048
0.156
0.000
-.3674404
-.1762215
-.5764941
-.0013289
1.10361
-.1830007
income
kids
-.0266038
-.0872584
.0117306
.1385026
-2.27
-0.63
0.023
0.529
-.0495952
-.3587184
-.0036123
.1842016
income
kids
0
0
income
kids
.0461827
-.3959413
5.08
-3.24
0.000
0.001
.0283595
-.6351267
.0640059
-.1567559
-1.201295
.614463
-1.407896
4.627051
4.395763
9.607583
nlogit chosen cost rating distance || type: income
kids, base(family) || restaurant:, noconstant
case(family_id)
type equations
fast
family
(base)
(base)
fancy
.0090936
.1220356
dissimilarity parameters
type
/fast_tau
/family_tau
/fancy_tau
1.712878
2.505113
4.099844
LR test for IIA (tau = 1):
1.48685
.9646351
2.810123
chi2(3) =
6.87
Prob > chi2 = 0.0762

nlogit chosen cost rating distance || type: income
kids, base(family) || restaurant:, noconstant
case(family_id)
Option noconstant suppresses the constant terms for the bottom-level alternatives
 Needed for convergence in this example unless you simplify things a little:
nlogit chosen distance || type: income , base(family) || restaurant:, case(family_id)


The error correlation parameters are re-expressed
as dissimilarity parameters

In Stata notation nlogit estimates a tau, in this
example for each “type” (upper level branch) with
subcategories (lower level branches, twigs,…)

In Cameron and Trivedi’s (MMA) notation,
dissimilarity parameters are rhos and called scale
parameters

In the normalised version of the nlogit model the dissimilarity
parameters are used to scale the logsums or inclusive values:

In Cameron and Trivedi’s (MMA) notation:
Inclusive value or logsum
The inclusive value for the mth nest is the expected value of the maximum utility that
Individual i can obtains from choosing an alternative within nest m

Nlogit estimates a tau (dissimilarity parameter, which is the
coefficient of the inclusive value/logsum) for each “type”
(upper level branch) with subcategories (lower level branches,
twigs,…)

dissimilarity parameters (inversely) measure the degree of
correlation (rho =1-tau2 or
for the mth nest) of
random shocks within each of the three types of restaurants

If greater than one the model is inconsistent with RUM

Dissimilarity parameters must fall between 0 and 1

If one of them (say the one for fast food) were less
than zero, something that increased the likelihood of
choosing Freebirds would decrease the likelihood of
choosing a fast food restaurant, which simply does
not make any sense

If the dissimilarity parameter is zero, the changes in
restaurant probabilities will not affect the choice of
type of restaurant and the correct model is recursive
(separated)

The conditional logit model is a special case of
nested logit where all the dissimilarity parameters
equal one

Our Likelihood-ratio test of this hypothesis here
shows mixed evidence of the null hypothesis that all
the dissimilarity parameters are equal to one

IIA holds if and only if all dissimilarity parameters
are equal to one

In LIML (two-step or sequential estimation) it was
often assumed for convenience that all of the
dissimilarity parameters were equal

This is a restriction you can impose on our stata code
too

You could estimate the NLOGIT in two steps using
LIML but you would need some complex corrections
of the standard errors in the second step

Nowadays we have toys powerful enough to run
NLOGIT all in one step using FIML

The latter is preferable (nlogit in Stata uses FIML),
since it is more efficient

The LIML sequential estimation might still help to provide starting values, as the
FIML log-likelihood is not globally concave

Try:

. nlogit chosen rating distance || type: income
kids, base(family) || restaurant: cost, noconstant
case(family_id)

. nlogit chosen rating distance cost || type: kids,
base(family) || restaurant: income, noconstant
case(family_id)

Issues: you can build your tree in different ways,
some will work better than others

Those choices in general will yield different results
anyway

No test to choose among trees
multinomial logit models with unobserved
heterogeneity
 They allow the parameters to vary randomly across
individuals
 See mixlogit command



(Hole, A. R. Fitting mixed logit models by using maximum simulated likelihood Stata Journal, 2007, 7,
388-401 )
(find C:/… traindata.dta)

multinomial logit models with unobserved
heterogeneity

The RPL allows for correlation across alternatives
through an individual-specific random effect















binary choice models
censored data
conditional logit
count data models
feasible generalized least squares
Heckit
identification problem
independence of irrelevant
alternatives (IIA)
index models
individual and alternative specific
variables
individual specific variables
latent variables
likelihood function
limited dependent variables
linear probability model
Principles of Econometrics, 3rd Edition

















logistic random variable
logit
log-likelihood function
marginal effect
maximum likelihood estimation
multinomial choice models
multinomial logit
odds ratio
ordered choice models
ordered probit
ordinal variables
Poisson random variable
Poisson regression model
probit
selection bias
tobit model
truncated data
Slide 16-46


Cameron and Trivedi’s MMA and MUS
Hensher, Rose, and Greene’s (2005) Applied
Choice Analysis: A Primer, available
(electronically too) at the QEII


Ordered Choice
Count data
```