```Review #1 …
1
Is there a systematic way to assess
regression studies? (SW Chapter 9)
Multiple regression has some key virtues:
· It provides an estimate of the marginal effect of X on Y
· It resolves the problem of omitted variable bias, if an omitted
variable can be measured and included
· It can handle nonlinear effects
Still, OLS might yield a biased estimator of the true causal effect
– it might not yield valid inferences…
2
A Framework for Assessing
Statistical Studies: Internal and
External Validity
Internal validity: the statistical inferences about causal effects
are valid for the population being studied.
External validity: the statistical inferences can be generalized
from the population and setting studied to other populations
and settings
· “setting” refers to the legal, institutional, policy, physical
environment and even time frame
3
Threats to External Validity
How far can we generalize class size results from California
school districts?
· Differences in populations
· California in 2005?
· Massachusetts in 2005?
· Mexico in 2005?
· Differences in settings
· different legal requirements concerning special education
· different treatment of bilingual education
· differences in teacher characteristics
4
Threats to Internal Validity
Internal validity: the statistical inferences about causal effects
are valid for the population being studied.
· Five threats to the internal validity of regression studies:
1. Wrong functional form
2. Omitted variable bias
3. Errors-in-variables bias
4. Sample selection bias
5. Simultaneous causality bias
· All of these imply what?
· And the consequence is what?
5
1. Wrong functional form
Arises if the functional form is incorrect – for example, an
interaction term is incorrectly omitted or quadratic term is
incorrectly used instead of log –
then inferences on causal effects will be biased!!
Potential solutions to functional form misspecification
1. If Y is continuous, use the “appropriate” nonlinear
specifications in X
2. If Y is discrete, move to “probit” or “logit” analysis
6
2. Omitted variable bias
Omitted variable bias arises if an omitted variable is both:
(i) a determinant of Y and
(ii) correlated with at least one included regressor.
Potential Solutions:
1. Measure & include the omitted var
2. Use panel data
3. Use instrumental variables regression
4. Conduct an RCT
Why does this work?
7
3. Errors-in-variables bias
So far we’ve assumed X is perfectly measured …
· Data entry errors
· Recollection errors in surveys (when did you start your job?)
· Ambiguous questions problems (income last year?)
· Intentionally false response problems with surveys
o What is the current value of your financial assets?
o How often do you drink and drive?
o Have you ever cheated at Davidson?
8
Errors-in-variables bias
Suppose LSA #1 - #3 hold,
Yi = b0 + b1Xi + ui
Let Xi = unmeasured true value of X
X! = imprecisely measured version of X
i
9
10
Errors-in-variables bias
Yi = b0 + b1 X!i + u!i , where u!i = b1(Xi – X!i ) + ui
·
If Xi is measured with error, X!i is in general correlated with u!i ,
so bˆ1 is biased and inconsistent!
·
Possible to derive formulas for the bias based on specific
mathematical assumptions about the measurement error
process … let’s look at one such case
o Classical measurement error
11
Potential solutions to
errors-in-variables bias
1. Obtain better data
2. Develop a model of the measurement error process
· Need to know a lot about the nature of the measurement
error – for example by cross-checking a subsample of
the data against separate administrative records and
analyzing the discrepancies, and creating a model of
them
· Very specialized; we won’t pursue this in this course
3. Instrumental variables regression
12
4. Sample selection bias
So far we’ve assumed i.i.d. random sampling. Sometimes this is
thwarted because the sample “selects itself”
Sample selection bias arises when a selection process:
(i) influences the availability of data and
(ii) that process is related to the outcome
13
Example #1: Mutual funds
· Do actively managed mutual funds outperform “hold-themarket” funds?
· Empirical strategy:
· Sampling scheme: i.i.d. random sampling of mutual funds
available to the public on a given date
· Data: returns for the preceding 10 years.
· Estimator: average ten-year return of the sample mutual
funds, minus ten-year return on S&P500
· Is there sample selection bias?
§ returni = b0 + b1managed_fundi + ui
14
Example #2: Returns to education
15
Potential solutions to sample
selection bias
· Collect the sample in a way that avoids sample selection.
· Mutual funds example: change the sample population from
those available at the end of the ten-year period, to those
available at the beginning of the period (include failed
funds)
· Returns to education example: sample college graduates,
not workers (include the unemployed)
· RCT
· Construct a model of the sample selection process and estimate
that model (we won’t do this)
16
5. Simultaneous causality bias
17
Simultaneous causality bias
18
Potential solutions to simultaneous
causality bias
1. RCT. Because Xi is chosen at random by the experimenter,
there is no feedback from the outcome variable to Yi
(assuming perfect compliance).
2. Develop and estimate a complete model of both directions of
causality. This is the idea behind many large macro models
(e.g. Federal Reserve Bank-US). This is extremely difficult in
practice.
3. Use instrumental variables regression to estimate the causal
effect of interest (effect of X on Y, ignoring effect of Y on X).
19
Internal and External Validity When the
Regression is Used for Forecasting
· Forecasting and estimation of causal effects are quite
different objectives.
· For forecasting,
· R 2 matters (a lot!)
· Omitted variable bias isn’t a problem!
· Interpreting coefficients in forecasting models is not
important – the important thing is a good fit and a model
you can “trust” to work in your application
· External validity is paramount: the model estimated using
historical data must hold into the (near) future
· More on forecasting when we take up time series data
20
Applying External and Internal Validity:
Test Scores and Class Size
Objective: Assess the threats to the internal and external validity
of the empirical analysis of the California test score data.
· External validity
· Compare results for California and Massachusetts
· Think carefully …
· Internal validity
· Go through the list of five potential threats to internal
validity and think carefully …
21
The Massachusetts data: summary
statistics
22
23
24
25
How do the Mass and California results compare?
· How best to control for Income?
· Evidence of nonlinearity in TestScore-STR relation?
· Is there a significant HiEL*STR interaction?
Income
California
ln(Inc) fit best
Massachusetts
cubic fits best
STR
cubic fit best
linear fits best
Different levels,
but effects of STR
similar for HiEL =
1&0
no
HiEL*STR
interaction?
26
Predicted effects for a class size
reduction of 2
27
28
Summary of Findings for
Massachusetts
29
Comparison of estimated class size
effects: CA vs. MA
30
Summary: Comparison of California and
Massachusetts Regression Analyses
· Class size effect falls in both CA, MA data when student and
· Class size effect is statistically significant in both CA, MA
data.
· Estimated effect of a 2-student reduction in STR is
quantitatively similar for CA, MA.
· Neither data set shows evidence of STR – PctEL interaction.
· Some evidence of STR nonlinearities in CA data, but not in
MA data.
31
Remaining threats to internal validity in the
test score/class size example?
1. Wrong functional form?
2. Omitted variable bias?
3. Errors-in-variables bias?
4. Selection bias?
5. Simultaneous causality bias?
32
Spot the Endogeneity?
• Estimating the effect of police on crime …
•
Effect of police on crime …
• Source: Levitt (American Economic Review, 1997)
33
Spot the Endogeneity?
• Estimating the effect of police on crime …
34
Spot the Endogeneity?
• Estimating the effect of guns on crime …
• Source: Ayres and Donohue (Stanford Law Review, 2003)
35
36
37
38
39
Spot the Endogeneity?
• Estimating the effect of studying on GPA …
• Source: Stinebricker and Stinebricker (BE Journal of Analysis & Policy, 2008)
40
Spot the Endogeneity?
• Estimating the effect of school start time on academic
performance …
• Source: Carrell, Maghakian, and West (American Economic Journal: Economic
Policy, 2011)
41
42
Spot the Endogeneity?
• Estimating the effect of national income on democracy
…
• Source: Acemoglu, Johnson, Robinson, and Yared (American Economic
Review, 2008)
43
Spot the Endogeneity?
• Estimating the effect of competition on student
achievement …
• Source: Hoxby (American Economic Review, 2000)
44
Spot the Endogeneity?
• Estimating the effect of teen drinking on HS
completion/attending college …
• Source: Dee & Evans (Journal of Labor Economics, 2003)
45
```