Regression Notes

```Problems with Incorrect Functional Form
• You cannot compare R2 between two different
functional forms.
▫ Why? TSS will be different.
• One should also remember that an incorrect
functional form may work within sample but have
large forecast errors outside of sample.
Linear Functional Form
Y
= β0 + β1 X1 + β2 X2 + ε
 Slope = β1
 Impact of X1 on Y is independent of the
quantity of X2.
 Elasticity = β1 * [X1/ Y]
Double-Log Functional Form
 What
if you wished to estimate the
following model?
 Y = β0 X1 β1 X2β2
 To make this linear in the parameters
 InY = β0 + β1 InX1 + β2 InX2 + ε
 Slope = β1 = ΔlnY / ΔlnX1 = [ΔY / Y] /
[ΔX1 / X1]
 What is this? The elasticity, which is
constant across the sample.
What is the slope in a double-functional
form?
 Slope
= β1 * (Y/X) =
[ΔY / Y] / [ΔX1 / X1] * (Y/X) =
ΔY / ΔX
 Impact of X1 on Y depends upon the
quantity of X2
 In other words, the slope of X1 varies across
the sample.
 Why would this be a realistic property?
Other Functional Form
 Semi-log
functional form
 Polynomial Form
 Inverse Form
 Know the equation and meaning of β1 for
each of these forms.
 More specifically, know the calculation
of slope and elasticity for each functional
form.
Problems with Incorrect Functional
Form
cannot compare R2 between two
different functional forms.
 You

Why? TSS will be different.
 An
incorrect functional form may work
within sample but have large forecast errors
outside of sample.
 Violation of Classical Assumption I: The
regression model is linear in the
coefficients, is correctly specified, and has
Testing for Functional Form
 The
Quasi-R2
 Box-Cox Test
 The MacKinnon, White, Davidson Test
(MWD)
Quasi R2
1.
2.
3.
4.
Estimate a logged model and create a set
of LnY^ (predicted logged dependent
variable).
Transform LnY^ by taking the anti-log. In
Excel (@exp) is the function needed.
Calculate a new RSS with the results of
step 2.
Calculate the quasi-R2 with the results of
step 3.
The Box Cox Test
 Calculate
the geometric mean of the dependent
variable in the model.

This can easily be calculated in Excel
 Create
a new dependent variable equal to Yi /
Geometric Mean of Y
 Re-estimate both forms of the model, with your
new dependent variable. Compare the Residual
Sum of Squares. Lowest value is the preferred
functional form.
MWD Test
1.
2.
3.
4.
5.
6.
Estimate the linear model an obtain the predicted
Y values (call this Yf^).
Estimate the double-logged model an obtain the
predicted lnY values (call this lnf^).
Create Z1 = ln(Yf^) – lnf^
Regress Y on X’s and Z1. Reject Ho (Y is a linear
function of independent variables) if Z1 is
statistically significant by the usual t-test.
Create Z2 = antilog of lnf^ - Yf^
Regress log of Y on log of X’s and Z2. Reject HA
(double-logged model is best) if Z2 is statistically
significant by the usual t-tests.
Intercept Dummies
• What if you thought season of the year
• Your demand function would include three
dummies (why three) to test the impact of
seasons.
• This type of dummy variable is called an
intercept dummy, since it changes the
constant term but not the slopes of the other
independent variables.
Criteria for choosing a specification
1. Occam’s razor or the principle of
parsimony - model should be kept as
simple as possible.
2. Goodness of fit
3. Theoretical consistency
4. Predictive power: Within sample vs. Out of
sample
If you leave out an important
variable a bias exists unless…
• The true coefficient of the omitted variables
is zero.
• Or, there is zero correlation between the
omitted variable(s) and the independent
variables in the model.
• If these conditions don’t hold, ommitted
variables will bias the coefficients in our
model.
What to do?
• What if you do not know which variable is
missing? In other words, what if you suspect
something is left out – thus producing
“strange” results – but you do not know
what?
Irrelevant Variables
• Including an irrelevant variable will
– Increase the standard errors of the variables,
thus reducing t-stats. (think back to how
standard errors are calculated)
– It does not introduce bias in the estimated
coefficients, but does impact our interpretation
of what we found.
Four Important
Specification Criteria
• Theory: Is the variable’s place in the equation
unambiguous and theoretically sound?
• t-Test: Is the variable’s estimated coefficient
significant in the expected direction?
• Adjusted R2: Does the overall fit of the equation
improve when the variable is added to the equation?
• Bias: Do other variables’ coefficients change
significantly when the variable is added to the
equation?
Specification Searches:
Other issues
• Good idea to rely on theory rather than statistical
fit.
• Good idea to minimize the number of equations
estimated.
• Bad idea to do sequential Searches or estimate an
undisclosed number of regressions before settling
on a final choice.
• Sensitivity Analysis: Are your results robust to
alternative specifications? If not, maybe your not
finding what you think you are finding.
```