### T - SREE

```Z
X
T
T
W
Y
U
Y
using instrumental variables in
education research
SREE workshop
march 2010
sean f reardon
outline






a little background on the potential outcomes
framework
what is an instrumental variable? and what’s it
good for?
assumptions needed to instrumental variables
practical methods of estimating IV models
sources of bias in IV models
potential outcomes framework
a stylized example


what is the effect of receiving tutoring in math
on student math achievement?
Observed Student Treatment and Achievement Data
ID
1
2
3
4
5
6
Treatment
Condition
no tutoring
no tutoring
no tutoring
tutoring
tutoring
tutoring
Test Score
55
60
65
60
72
63
Observed Student Treatment and Achievement Data
ID
1
2
3
4
5
6
Treatment
Condition
no tutoring
no tutoring
no tutoring
tutoring
tutoring
tutoring
Test Score
55
60
65
60
72
63
Observed and Unobserved Potential Achievement Data
Student ID
1
2
3
Untutored Average
4
5
6
Tutored Average
Overall Average
Treatment
Condition
no tutoring
no tutoring
no tutoring
Score if not
Tutored
55
60
65
60
tutoring
tutoring
tutoring
Score if
Tutored
60
72
63
65
Observed
Score
55
60
65
60
60
72
63
65
62.5
Tutoring
Effect
Definition of an “effect”

The effect, , [on some outcome Y] [for some unit i] [of
some treatment condition t relative to some other
condition c] is defined as the difference between the
value of Y that would be observed if unit i were
exposed to treatment t and the value of Y that would
be observed if unit i were exposed to treatment c.

More formally, we define the effect of t relative to c on
Y for unit i as:

We define the average effect of t relative to c in a
population P as:
The “Fundamental Problem of Causal
Inference” (Holland, 1986)




Although both and are defined in principle, it is
impossible to observe both of them for the same unit
(because any given unit can be exposed to only one of
t or c).
Thus, the causal effect cannot be observed.
The problem of causal inference is thus a problem of
missing data. The outcome Yi under its
“counterfactual” condition is never observed.
How can we construct unbiased estimates of the
average potential outcomes and under the
counterfactual conditions?
Observed Student Treatment and Achievement Data
ID
1
2
3
4
5
6
Treatment
Condition
no tutoring
no tutoring
no tutoring
tutoring
tutoring
tutoring
Test Score
55
60
65
60
72
63
Observed and Possible Unobserved Potential Achievement Data
Student ID
1
2
3
Untutored Average
4
5
6
Tutored Average
Overall Average
Treatment
Condition
no tutoring
no tutoring
no tutoring
tutoring
tutoring
tutoring
Score if not
Tutored
55
60
65
60
55
60
65
60
60
Score if
Tutored
60
72
63
65
60
72
63
65
65
Observed
Score
55
60
65
60
60
72
63
65
62.5
Tutoring
Effect
+5
+12
-2
+5
+5
+12
-2
+5
+5
Observed Student Treatment and Achievement Data
ID
1
2
3
4
5
6
Treatment
Condition
no tutoring
no tutoring
no tutoring
tutoring
tutoring
tutoring
Test Score
55
60
65
60
72
63
Observed and Possible Unobserved Potential Achievement Data
Student ID
1
2
3
Untutored Average
4
5
6
Tutored Average
Overall Average
Treatment
Condition
no tutoring
no tutoring
no tutoring
tutoring
tutoring
tutoring
Score if not
Tutored
55
60
65
60
55
70
70
65
62.5
Score if
Tutored
60
55
65
60
60
72
63
65
62.5
Observed
Score
55
60
65
60
60
72
63
65
62.5
Tutoring
Effect
+5
-5
0
0
+5
+2
-7
0
0
What if we can’t conduct an RCT?


If we can randomize students to receive either tutoring or
no tutoring, and ensure that every student complies with his
or her assigned treatment status, the randomization will
allow us to estimate the effect of tutoring very easily.
but what if students don’t comply with their treatment
assignment?





some assigned to tutoring don’t go to tutoring
some assigned to no tutoring get tutored anyway
this means tutoring is no longer randomly assigned – at least some
of the variation in treatment status is potentially endogenous
so a comparison of those assigned to tutoring and no tutoring won’t
give us an estimate of the effect of tutoring (but only the effect of
being assigned to tutoring)
this is one case where instrumental variables are useful
instrumental variables models
What is an instrumental variable?


an instrumental variable is an exogenous factor
that causes some of the variation in treatment
status (though need not be all)
we use it to identify the portion of variation in
treatment that is exogenous and then only rely
on that exogenous variation to estimate the
effect of treatment
A general structural model
Z
X
T
T
W
Y
U
Y
T: treatment status
Y: outcome measure
X: observed confounders
U: unobserved confounders
W: observed ignorable causes of Y
Y: unobserved ignorable causes of Y
T: unobserved ignorable causes of T
Z: instrument (observed ignorable
cause of T)
Relating treatments and outcomes


T
Y

we would like to estimate the
effect of T on Y
this involves seeing how T and Y
are related
but to infer a causal relationship
from the covariance of T and Y,
we need to understand the
source of variation in T

why do some people get different
types/degrees of the treatment?
Relating treatments and outcomes

Z
variation in T may be caused by
factors unrelated to the
outcome Y

T
Y


T
these may be observed (Z)
or unobserved (T)
if the only variation in Z comes
from factors unrelated to Y,
then T is as good as randomly
assigned, so getting a causal
estimate is easy
Relating treatments and outcomes

X
variation in T may be caused, in
part, by observed factors that
are related to the outcome Y

T
Y
T

as long as there is some
variation in T that is caused by
some (not necessarily
observable) ignorable cause (Z
or T), we can still easily get an
estimate of the effect of T

observed confounders (X)
statistically control for X (compute
relationship between T and Y,
conditional on X)
Relating treatments and outcomes

X
T
variation in T may be caused, in
part, by observed and unobserved
factors that are related to the
outcome Y

Y


T
U

here, we cannot get an unbiased
estimate of the effect of T


observed confounders (X)
unobserved confounders (U)
reverse causality (Y affects T)
statistical control can’t adjust for U
the ignorable cause (T) is not
observed
Relating treatments and outcomes

Z
X
T
Y

T
U

if we cannot observe all the
confounders (or if Y affects T),
then we need some observed
factor that affects T but does
not otherwise affect Y
this (Z) is called an instrument
(or instrumental variable).
because the part of the variation
in T that is induced is ignorable
(as good as random), we can use
this part of the variation in T to
identify the effect of T on Y
Tutoring example, revisited


the observed data is not sufficient to estimate
the average effect of tutoring
what if we can’t do an experiment, or if we do an
experiment and not everyone complies?
tutoring voucher as an instrument


randomly assign eligible students to receive a
either voucher allowing them to receive free
tutoring (Z=1) or no voucher (Z=0).
observe whether students attend tutoring (T=1) or
not (T=0).



note: this choice is not random—students may choose
tutoring or not, regardless of voucher status (Ti≠Zi).
observe later achievement (Y)
we want to estimate the effect of T (tutoring vs no
tutoring) on Y (achievement).
Four subpopulations (angrist, imbens, & rubin, 1996)

compliers


those who would comply with treatment assignment (those
for whom Ti=Zi)
non-compliers

always-takers


never-takers


those who would always receive the treatment, regardless of
assignment (those for whom Ti=1)
those who would never receive the treatment, regardless of
assignment (those for whom Ti=0)
defiers

those who would always do the opposite of treatment
assignment (those for whom Ti=1-Zi)
Observed Outcomes

N=100, 50% receive vouchers, but not all comply
with assignment (only 60% comply):
Offered
Voucher
No
Yes
Total
Tutored
No Yes
45 5
15
35
60 40
Proportion
Tutored
.10
.70
.40
Observed Outcomes

N=100, 50% receive vouchers, but not all comply
with assignment (only 60% comply):
Offered
Voucher
No
Yes
Total
Tutored
No Yes
45 5
15
35
60 40
Proportion
Tutored might be
compliers or
never-takers
.10
.70
.40
Observed Outcomes

N=100, 50% receive vouchers, but not all comply
with assignment (only 60% comply):
Offered
Voucher
No
Yes
Total
Tutored
No Yes
45 5
15
35
60 40
Proportion
Tutored
might be
.10
defiers or
never-takers
.70
.40
Observed Outcomes

N=100, 50% receive vouchers, but not all comply
with assignment (only 60% comply):
Offered
Voucher
No
Yes
Total
Tutored
No Yes
45 5
15
35
60 40
Proportion
Tutored might be
defiers or
always-takers
.10
.70
.40
Observed Outcomes

N=100, 50% receive vouchers, but not all comply
with assignment (only 60% comply):
Offered
Voucher
No
Yes
Total
Tutored
No Yes
45 5
15
35
60 40
Proportion
Tutored
might be
.10
compliers or
always-takers
.70
.40
estimating the proportion of compliers





assume there are no defiers
then everyone with Z=1, T=0 is a never-taker (15
of 50 (30%) with Z=1 in our example)
there should be the same proportion (30%) of
never-takers among those with Z=0, because Z is
random
the same logic implies there are 10% of the
population who are always-takers
thus, 60% (100% - 30% - 10%) are compliers
Estimating the proportion of compliers


we can also estimate this by regressing the
treatment variable on the instrument
tutor = G0 + G1*voucher + e
tutor = .10 + 0.60*voucher
Thus, the average effect of being assigned a
voucher on tutoring status is +0.60, meaning
that the average student’s probability of
receiving tutoring increases by 0.60 if assigned a
voucher (which means that 60% of the students
comply with the voucher assignment).
Observed Outcomes
Estimated effect of the voucher offer on test
scores = 56.6 – 50.5 = +6.1
Offered
Tutored
Voucher
No Yes
Total
No
48.3 70.0
50.5
Yes
44.9 61.6
56.6
Total
47.5 62.6
53.5

Observed Outcomes
Estimated effect of the voucher offer on test
scores = 56.6 – 50.5 = +6.1
Offered
Tutored
here we’re
Voucher
No Yes
Total
assuming
no defiers
No
48.3 70.0
50.5
(later we will
see why
Yes
44.9 61.6
56.6
this is
necessary)
Total
47.5 62.6
53.5

average
outcome among
untutored
compliers and
never-takers
Observed Outcomes
Estimated effect of the voucher offer on test
scores = 56.6 – 50.5 = +6.1
Offered
Tutored
here we’re
Voucher
No Yes
Total
assuming
no defiers
No
48.3 70.0
50.5
(later we will
see why
Yes
44.9 61.6
56.6
this is
necessary)
Total
47.5 62.6
53.5

average
outcome among
untutored
compliers and
never-takers
average
outcome among
tutored
compliers and
always-takers
OLS estimates




OLS yields:
test = 47.5 + 15.1*(tutored)
the estimated effect of tutoring is +15.1 points
but we should worry about whether this is
biased, because some students chose whether
to get tutoring or not.
the tutored group includes compliers and alwaystakers; the control group includes compliers and
never-takers; so they are not equivalent groups
The Wald IV estimator

if we are willing to assume that the voucher offer
had no effect on the outcome of the noncompliers (because it did not alter their
treatment status and does not affect their
outcome through any other way), then we can
estimate the effect of tutoring like this:
 The
average effect of the voucher in the population
is estimated to be +6.1
 but only 60% of students’ decisions about whether
to get tutoring were affected by the voucher offer
(only 60% of sample are compliers)
Wald estimator

average effect in population ( )
= average effect on compliers ( )
x proportion who are compliers ( )
+ average effect on non-compliers ( )
x proportion who are non-compliers ( )
Wald estimator


this says that the average effect of the treatment
among the compliers equals the average effect
in the population divided by the proportion of
the population who are compliers
thus, the average effect among the compliers is
= +6.1/.60
= +10.1
What have we learned?

An instrumental variable allows us to estimate the average
effect of the treatment among those whose treatment
status is affected by the instrument (“compliers”)



called the “local average treatment effect” (LATE)
note that we can’t identify who the compliers are
We can’t estimate the average treatment effect in the
population, because we can’t estimate the effect among
non-compliers

because the instrument doesn’t affect their treatment status, there
is no exogenous variation in their treatment status that we can use.


the instrument only affects the outcome through
its impact on the treatment (this is called the
exclusion restriction)
the instrument is ignorably (randomly) assigned
 this
allows us to estimate the effect of the
instrument on the outcome and on the treatment

the instrument affects the treatment for at least
some people
 otherwise

there are no compliers
there are no defiers
more general IV models
what if treatment is not binary?


above we assumed the treatment (tutoring) was
binary
but not all treatments are binary
 we
could offer vouchers of different amounts
 students could receive different amounts of tutoring

as a result, compliance may take on many values
 for
some students, the amount of tutoring received
may be strongly affected by the instrument; for
others, it may be weakly affected or not at all
affected.
a more general model of the IV estimator



for a given individual i,
is the effect of Z on Y
this effect may vary
across individuals
we would like to
estimate the average
effect,
Zi
i
Yi
1. exclusion restriction


if the only way that
Z affects Y is through
its effect on T, then
we have
.
or, put differently,
Zi
γi
Ti
i
Yi
the assumption that the only way that Z affects Y
is through its effect on T is called the exclusion
restriction.

2. zero compliance-effect covariance

we can write the average effect of Z on Y as

if we assume

the assumption that
is called the
zero compliance-effect covariance assumption.
, then we have
3. instrument relevance

as long as

the assumption that
is sometimes called
the instrument relevance assumption; or
sometimes just referred to as the assumption
that the instrument affects the treatment.
if is small (close to zero), we say that the
instrument is a weak instrument.

, we can rewrite the above as
4. the instrument is ignorably assigned

if the above three assumptions are met, we have

if Z is ignorably assigned, then we can easily
estimate both (the average effect of Z on Y)
and (the average effect of Z on T).
the assumption of ignorable assignment thus
makes estimation of the effect of T on Y possible.

what do these assumptions mean?


exclusion restriction: the offer of a tutoring
voucher does not affect students’ achievement
except by affecting the amount of tutoring they
zero compliance-effect covariance: there is no
correlation between how strongly a voucher
offer affects the amount of tutoring a student
gets and how effective tutoring is for that
student
what do these assumptions mean?


instrument relevance: the offer of a voucher has
some effect, on average, on the amount of
tutoring students receive (at least one student is
affected by the offer).
ignorable assignment of the instrument: the
voucher offer is randomly assigned (this would
be violated, for example, if the principal gave
vouchers to students she deemed most in need
of tutoring).
some examples

NYC voucher experiment (howell et al, 2002; krueger
& zhu, 2004)


Effect of schooling on wages, using quarter of
birth as instrument (angrist & kreuger, 1991).
Effect of teacher absence on student
achievement, using snowfall as instrument (miller,
murnane & willet, 2007)

Effects of segregation on educational attainment
and wages, using railroads as an instrument
(ananat 2007)
estimating IV models
estimating IV models in practice

in practice, we don’t usually compute the effect
of Z on Y and Z on T and divide them
 because
we made need more complex models (if we
want to include other covariates in the model, for
example)
 because we need to compute standard errors

most common methods of estimating IV models
is with two-stage least squares (TSLS or 2SLS).
Three relevant equations

1:
is the person-specific effect of Z on Y.






2:
is the person-specific effect of Z on T.
but the equation we really are interested in is
3:
is the person-specific effect of T on Y.
Three relevant equations

1:
is the person-specific effect of Z on Y.






the “reduced form”
equation
2:
is the person-specific effect of Z on T.
but the equation we really are interested in is
3:
is the person-specific effect of T on Y.
Three relevant equations

1:
is the person-specific effect of Z on Y.






the “reduced form”
equation
the “first stage”
equation
2:
is the person-specific effect of Z on T.
but the equation we really are interested in is
3:
is the person-specific effect of T on Y.
Three relevant equations

1:
is the person-specific effect of Z on Y.






the “reduced form”
equation
the “first stage”
equation
2:
is the person-specific effect of Z on T.
but the equation we really are interested in is
the “second stage”
3:
equation
is the person-specific effect of T on Y.
two-stage least squares

fit the first-stage equation (estimate the effect of
Z on T); compute fitted values:

fit the second-stage equation, using predicted
values of T in place of observed values of T:
two-stage least squares

fit the first-stage equation (estimate the effect of
Z on T); compute fitted values:

fit the second-stage equation, using predicted
values of T in place of observed values of T:
two-stage least squares

fit the first-stage equation (estimate the effect of
Z on T); compute fitted values:

fit the second-stage equation, using predicted
values of T in place of observed values of T:
two-stage least squares


because the predicted values of T from the firststage equation include only the variation in T
that is caused by the instrument, the estimated
coefficient from the second-stage equation will
be unbiased (as long as the 4 IV assumptions are
met).
if you do this by hand, you’ll get the wrong
standard errors; statistical software usually has
built-in routines (e.g., -ivregress- command in
Stata) to compute correct standard errors.
Effects of attending charter school




we can’t randomize students to charter or
Abdulkadiroglu, et al (2009) examine students
who apply to oversubscribed charter schools,
whose admission is determined by lottery
(randomization)
instrument is winning the lottery
treatment is # of years in a charter school
example: effect of charter schooling
first stage
(compliance)
reduced form
2sls
(effect of winning (effect of a
lottery on ach.) year in charter)
are the IV assumptions valid in this study?




exclusion restriction?
zero compliance-effect covariance?
instrument relevance?
ignorable assignment?
sources of bias in IV models
sources of bias in IV





failure of exclusion restriction assumption
failure of ignorability assumption
failure of zero compliance-effect covariance assumption
finite sample bias
weak instruments cause 3 problems:



exacerbate bias due to failure of assumptions (exclusion restriction,
ignorability, zero covariance)
exacerbate finite sample bias
lead to incorrect estimation of standard errors when using twostage least squares
failure of the exclusion restriction


recall that the exclusion
restriction says that the
only way that Z affects Y
is through its effect on
T.
as a result, we can write
Zi
γi
Ti
i
Yi
failure of the exclusion restriction


if the exclusion
restriction is violated,
then there is some other
path through which Z
affects Y
as a result, we can write
Zi
i
γi
Ti
i
Yi
failure of the zero covariance assumption

averaging the above in the population

now, dividing through by , we get

so the IV estimator (the ratio of the
average effect of Z on Y to the
average effect of Z on T) will be
biased
 if
is small, the biases will be larger
failure of the zero covariance assumption

averaging the above in the population

now, dividing through by , we get

so the IV estimator (the ratio of the
average effect of Z on Y to the
average effect of Z on T) will be
biased
 if
is small, the biases will be larger
bias due to failure
of the exclusion
restriction
bias due to failure
of the zero
compliance-effect
covariation
assumption
failure of the zero covariance assumption

if all the assumptions except the zero
compliance-effect covariance assumption are
met, we have

so the IV model will estimate the complianceweighted average treatment effect (CWATE).
 if
T is binary and there are no defiers, this will be the
same as the average effect among the compliers
(LATE), because non-compliers will get 0 weight.
failure of the ignorability assumption


if the instrument is not ignorably assigned, then
we cannot obtain unbiased estimates of the
effect of Z on Y or of the effect of Z on T.
Thus, the ratio of the two may be biased.
weak instruments


weak instruments do not, strictly-speaking,
violate any of the IV assumptions, but they do
exacerbate the bias from other assumptions
rule of thumb: an instrument is weak if the Fstatistic on the instrument(s) from the first stage
equation is <10.
weak instruments and bias the IV estimator

weak instruments cause 3 problems with IV
estimator:
exacerbate bias due to failure of the exclusion
restriction, ignorability, and monotonicity
 exacerbate finite sample bias
 lead to incorrect estimation of standard errors when
using two-stage least squares


finite sample bias
even if the 4 IV assumptions are met, IV estimation is
biased unless using an infinite sample
 most pronounced with weak instruments and small
samples

mediation models


suppose we randomly assign a treatment (e.g.,
teacher professional development) that we think
will affect student learning by affecting
instructional practice
we can treat the PD as an instrument, and the
mediator (instructional practice) as the
‘treatment’ and use IV to estimate the effect of
instructional practice (which can’t be
randomized) on learning
 but
worry about exclusion restriction (are there
other ways that the PD could affect learning?)
multiple mediator models

suppose we have a randomize students to 3
treatment conditions.
two first stage equations:

second stage equation:

IV to correct for measurement error

suppose we want to estimate the effect of
cognitive skill on wages:

if cognitive skill is measured with error by ACH,
OLS will give a biased estimate of .
if we have a second test of skills, we can use one
test as an instrument for the second test, and
then use the predicted value of the second test
in the wage equation.
called “errors-in-variables” (EIV) model.

