### The F-Test

```Linear Functional Form
Y = β0 + β1 X1 + β2 X2 + ε
Slope = β1
Impact of X1 on Y is independent of the quantity
of X2.
Elasticity = β1 * [X1/ Y]
Double-Log Functional Form
What if you wished to estimate the following
model?
Y = β0 X1 β1 X2β2
To make this linear in the parameters
InY = β0 + β1 InX1 + β2 InX2 + ε
Slope = β1 = ΔlnY / ΔlnX1 = [ΔY / Y] / [ΔX1 /
X1]
What is this? The elasticity, which is constant
across the sample.
What is the slope in a double-functional form?
Slope = β1 * (Y/X) =
[ΔY / Y] / [ΔX1 / X1] * (Y/X) =
ΔY / ΔX
Impact of X1 on Y depends upon the quantity
of X2
In other words, the slope of X1 varies across the
sample.
Why would this be a realistic property?
Other Functional Form
Semi-log functional form
Polynomial Form
Inverse Form
Know the equation and meaning of β1 for each
of these forms.
More specifically, know the calculation of slope
and elasticity for each functional form.
Problems with Incorrect Functional Form
You cannot compare R2 between two different
functional forms.
◦ Why? TSS will be different.
An incorrect functional form may work within
sample but have large forecast errors outside of
sample.
Violation of Classical Assumption I: The
regression model is linear in the coefficients, is
correctly specified, and has an additive error
term.
Testing for Functional Form
The Quasi-R2
Box-Cox Test
The MacKinnon, White, Davidson Test
(MWD)
Quasi
1.
2
R
Estimate a logged model and create a set of
LnY^ (predicted logged dependent variable).
2. Transform LnY^ by taking the anti-log. In
Excel (@exp) is the function needed.
3. Calculate a new RSS with the results of step
2.
4. Calculate the quasi-R2 with the results of step
3.
The Box Cox Test
Calculate the geometric mean of the dependent
variable in the model.
◦ This can easily be calculated in Excel
Create a new dependent variable equal to Yi /
Geometric Mean of Y
Re-estimate both forms of the model, with your
new dependent variable. Compare the Residual
Sum of Squares. Lowest value is the preferred
functional form.
MWD Test
1.
Estimate the linear model an obtain the predicted Y values (call
this Yf^).
2.
Estimate the double-logged model an obtain the predicted lnY
values (call this lnf^).
3.
Create Z1 = ln(Yf^) – lnf^
4.
Regress Y on X’s and Z1. Reject Ho (Y is a linear function of
independent variables) if Z1 is statistically significant by the
usual t-test.
5.
Create Z2 = antilog of lnf^ - Yf^
6.
Regress log of Y on log of X’s and Z2. Reject HA (double-logged
model is best) if Z2 is statistically significant by the usual ttests.
INTERCEPT DUMMIES
•What if you thought season of the year
•Your demand function would include three
dummies (why three) to test the impact of
seasons.
•This type of dummy variable is called an
intercept dummy, since it changes the constant
term but not the slopes of the other
independent variables.
SLOPE DUMMIES
Interaction Term – an independent variable in a
regression that is the multiple of two or more
independent variables.
This can be used to see if a qualitative
condition, which we would analyze with a
dummy, impacts the slope of another
independent variable.
CRITERIA FOR CHOOSING A
SPECIFICATION
1. Occam’s razor or the principle of
parsimony - model should be kept as
simple as possible.
2. Goodness of fit
3. Theoretical consistency
4. Predictive power: Within sample vs.
Out of sample
IF YOU LEAVE OUT AN
IMPORTANT VARIABLE A
BIAS EXISTS UNLESS…
The true coefficient of the omitted
variables is zero.
Or, there is zero correlation between the
omitted variable(s) and the independent
variables in the model.
If these conditions don’t hold, omitted
variables will bias the coefficients in our
model.
WHAT TO DO?
Add the missing variable.
What if you do not know which
variable is missing? In other words,
what if you suspect something is left
out – thus producing “strange”
results – but you do not know what?
IRRELEVANT
VARIABLES
Including an irrelevant variable will
Increase the standard errors of the variables,
thus reducing t-stats. (think back to how
standard errors are calculated)
It does not introduce bias in the estimated
coefficients, but does impact our
interpretation of what we found.
FOUR IMPORTANT
SPECIFICATION CRITERIA
Theory: Is the variable’s place in the equation
unambiguous and theoretically sound?
t-Test: Is the variable’s estimated coefficient
significant in the expected direction?
Adjusted R2: Does the overall fit of the equation
improve when the variable is added to the
equation?
Bias: Do other variables’ coefficients change
significantly when the variable is added to the
equation?
SPECIFICATION SEARCHES:
OTHER ISSUES
Good idea to rely on theory rather than statistical fit.
Good idea to minimize the number of equations
estimated.
Bad idea to do sequential Searches or estimate an
undisclosed number of regressions before settling on a
final choice.
Sensitivity Analysis: Are your results robust to
alternative specifications? If not, maybe your not
finding what you think you are finding.
```