Econometric Forecasting

```A Brief Introduction
A.
Data (variables). Can be in three forms:
1. Interval – There is a common scale to measure
the variable, so that a value of two is actually
twice a value of one. Examples: % of vote,
degrees Fahrenheit, number killed, duration of
regime, number of soldiers, GDP
2. Ordinal – There is a rank-ordering to the
variable, so 2 > 1, but the scale varies so that
2 is not exactly twice one. Examples: Yes/No
variables, how close a bill is to passage (no
houses, one house, both houses, signature),
war outcomes (win, lose, or draw)
3. Nominal – There are numbers, but they are
completely arbitrary. Examples: country codes,
leader names, strategy choices, apples and
oranges.
1.
2.
Examples include % of the two-party
Presidential vote, % seats held by Dems,
war/non-war, political (in)stability, etc.
Easiest to have a continuous (interval) DV,
but techniques exist for all three types
1.
2.
3.
Can be either interval or ordinal. So…
Transform nominal into ordinal. Example: Is
this country the US? A nominal variable
(USA) becomes an ordinal one (Yes or No).
Again, examples in syllabus
1.
Positive (or direct) correlation: the values
of the IV and DV move up and down
together (poverty and crime, CO2 and
prostitution, geographic proximity and
conflict)
1.
2.
Positive (or direct) correlation: the values
of the IV and DV move up and down
together (poverty and crime, CO2 and
prostitution, geographic proximity and
conflict)
Negative (or inverse): The values of the IV
and DV move in opposite directions
(alcohol and coordination, democracy and
interstate conflict, war and development)
1.
2.
3.
Positive (or direct) correlation: the values
of the IV and DV move up and down
together (poverty and crime, CO2 and
prostitution, geographic proximity and
conflict)
Negative (or inverse): The values of the IV
and DV move in opposite directions
(alcohol and coordination, democracy and
interstate conflict, war and development)
Conditional: Direction depends on the
value of some other variable
Dependent
Variable
Independent
Variables
Statistical
Relationships
A.
Simplest tool: the scatterplot or scatter
diagram. Example from medicine:


A researcher believes that there is a linear
relationship between BMI (Kg/m2) of pregnant
mothers and the birth-weight (BW in Kg) of
their newborn
The following data set provide information on
15 pregnant mothers who were contacted for
this study
BMI (Kg/m2)
Birth-weight (Kg)
20
30
50
45
10
30
40
25
50
20
10
55
60
50
35
2.7
2.9
3.4
3.0
2.2
3.1
3.3
2.3
3.5
2.5
1.5
3.8
3.7
3.1
2.8

Scatter diagram plots bivariate observations
(X, Y)  BMI (the IV) is X and birthweight
(the DV) is Y
◦ Y is the dependent variable (Dependent goes
Down the side)
◦ X is the independent variable (goes across the
graph)
Scatter diagram of BMI and Birthweight
4
3.5
3
2.5
2
1.5
1
0.5
0
0
10
20
30
40
50
60
70


People tend to mentally fit a line or curve to
describe the shape of the scatterplot
Examples:
Strong relationships
Y
Weak relationships
Y
X
Y
X
Y
X
X
No relationship
Y
X
Y
X
Linear relationships
Y
Curvilinear relationships
Y
X
Y
X
Y
X
X
1.
2.
3.
Intended to simplify relationship. The line
is ultimately an estimate, usually known to
be wrong (but close enough to be useful)
Line is probabilistic, not deterministic –
otherwise it would perfectly pass through
every point on the scatterplot
= key difference between predicting
politics and predicting planetary orbits.
Kepler’s equations are deterministic, but
econometric models are probabilistic
 Sample scatterplot:
60
40
20
0
Y
0
20
40
X
60
How would you draw a line through the
points? How do you determine which line
‘fits best’?
Y
60
40
20
0
X
0
20
40
60
How would you draw a line through the
points? How do you determine which line
‘fits best’?
Y
60
40
20
0
X
0
20
40
60
How would you draw a line through the
points? How do you determine which line
‘fits best’?
Y
60
40
20
0
X
0
20
40
60
How would you draw a line through the
points? How do you determine which line
‘fits best’?
Y
60
40
20
0
X
0
20
40
60
How would you draw a line through the
points? How do you determine which line
‘fits best’?
Y
60
40
20
0
X
0
20
40
60
How would you draw a line through the
points? How do you determine which line
‘fits best’?
Y
60
40
20
0
X
0
20
40
60
How would you draw a line through the
points? How do you determine which line
‘fits best’?
Y
60
40
20
0
X
0
20
40
60

Regression = using an equation to find the
line (or curve) that most closely fits the data
 a. Relationship Between Variables Is a Linear Function
Constant, or
Y-Intercept
Coefficient
of X, or
Slope
Random
Error
Y   0  1X  
Dependent
Variable
Independent
(Explanatory or
Control) Variable
 It should….
Y
Y = mX + b
m = Slope
Change
in Y
Change in X
b = Y-intercept
X
High School Teacher



As you remember from
high school math, the
basic equation of a line
is given by y=mx+b
where m is the slope
and b is the yintercept
One definition of m is
that for every one unit
increase in x, there is
an m unit increase in y
One definition of b is
the value of y when x
is equal to zero
Line
20
18
16
y = 1.5x + 4
14
12
10
8
6
4
2
0
0
2
4
6
8
10
12




Look at the data in
this picture
Does there seem to
be a correlation
(linear relationship)
in the data?
Is the data perfectly
linear?
Could we fit a line
to this data?
25
20
15
10
5
0
0
2
4
6
8
10
12



Linear regression tries
to find the best line
(curve) to fit the data
The equation of the
line is
The method of finding
the best line (curve) is
least squares, which
minimizes the sum of
the distance from the
line for each of points
25
20
y = 1.5x + 4
15
10
5
0
0
2
4
6
8
10
12
a.
b.
Find the values of  that minimize the squared
vertical distance from the line to each of the point.
This is the same as minimizing the sum of the i2
Why minimize squared errors? ‘Best Fit’ Means
Difference Between Actual Y Values & Predicted Y
Values Are a Minimum  But Positive Differences
Offset Negative! (errors of 10 and -10 add to
zero)  squaring errors solves the problem: 10 *
10 = 100 and -10 * -10 also = 100.
For each observation i, the equation is merely an estimate, not
the actual value. There are errors (εi), and the line minimizes
the sum of ε12, ε22, ε32, ε42, ε52, and so on.


Yi   0   1X i
Y
^4
^2
^1
^3
X

Regression Formula: Y = a + bX, Y = α + βX, Y = α +
β1X1, Y = β0 + β1X1, etc  all are the same formula!
• Y = the predicted value of the dependent variable (its
estimated mean given X)
• a (or alpha: α, or beta-zero: β0) = the Y intercept, or the value of
Y when X = 0 (constant)
• b (or beta: β) = the regression coefficient, the slope of the
regression line, or the amount of change produced in Y by a
unit change in X
 Positive sign of regression coefficient: positive direction of
association
 Negative sign of regression coefficient: negative direction of
association
• X = the value of the independent variable
47

What is:
◦
◦
◦
◦
Y?
X?
β1?
β0?

Typical formula: Y = β0 + β1X1 + β2X2 + β3X3, etc.
• DV, constant haven’t changed
• But now there are several independent variables
• Each IV has its own coefficient. So the first X may be
positively related to Y, while the others might be negatively
related to Y.
• Could plot the effect of any one independent variable on Y as a
line, but can no longer plot the whole equation since there are
now as many dimensions as there are independent variables
(plus one, for Y).
• Multivariate regression is best interpreted by consulting tables
of coefficients, evaluating the effect of each X separately (i.e.
all else being equal)
49
1. R2 : Proportion of the variation in the dependent
variable (Y ) that is explained by the independent
variable (X)
 R2 =Explained variation/Total variation


Ranges between 0 (no reduction in error) and 1 (no
errors remain – the model perfectly predicts the
dependent variable)
R2 is a comparative measure – it compares the amount
of error made by the linear regression to the amount of
error made by guessing the mean (average) value of Y
for every case (e.g. Y = 12 for every case)
50
It is how much variation there is when you know X (i.e. how
good your line fits the data) compared to how much variation
there is when you don’t know X (which means you just assume
the mean of Y is constant). First the regression….
Y (Internet use, hours per
week)
…and now the variance without regression
16
14
12
10
8
6
4
2
0
Y
Predicted Y
0
1
2
3
4
X (Education level)
51
Good Fit
2
1.8
1.6
1.4
y
1.2
1
0.8
y = 1.9599x + 0.2823
0.6
2
R = 0.9369
0.4
0.2
0
0
0.2
0.4
0.6
x1
0.8
1
Poorer Fit
3
2.5
y
2
1.5
1
y = 1.9696x + 0.5683
2
R = 0.811
0.5
0
0
0.2
0.4
0.6
x1
0.8
1
1.2
Statistical significance of the regression model  uses one of
a number of indicators (χ2, for example). No need to
understand the indicator to interpret it. Look for a “p value”
associated with the indicator.
b.
Statistical Significance of each Regression Coefficient (β1, for
example). Also measured by a p value.
c.
Key is to find p and see if p < .05 (in the social sciences). If
yes  statistically significant. If no  not statistically
significant.
The p value is the probability that random noise would have
coincidentally given you an association this strong. Hence,
lower values of p are “better.”
a.
56
•
•
•
•
The p value is the probability that random data (i.e. no real
relationship with Y) would have coincidentally given you an
association this strong. Hence, lower values of p are “better.”
Authors sometimes say “significant at the .001 level.” This
means p < .001. There may or may not be a table of p values
for coefficients – authors frequently use asterisks to highlight
coefficients at a given level of significance.
If the model is not significant, the author has failed to discover a
significant correlation between the model’s predicted values of
Y and the actual values of Y.
If a coefficient is not significant, then the author has failed to
discover a significant correlation between that particular X and
Y.
57
•
•
“p <.6 so the relationship is statistically insignificant, and
therefore I conclude that X doesn’t affect Y” – Not true, because
p could be .001. All we know is that it is less than .23. In other
words, absence of evidence is not evidence of absence. Indeed
when the number of cases is very small, all of the p values –
even for real relationships – are likely to be too large to make
the coefficients statistically significant
“p < .000001 so the relationship between X and Y is very
strong” – Not true, because p values for any coefficient (no
matter how tiny) becomes smaller as the number of cases
increases. Millions of cases  just about every relationship is
“statistically significant,” but many are substantively trivial
58
This depends on what you are looking for!
•
What units are X and Y measured in
•
Does the coefficient mean that small
increases in X lead to large increases in Y?
If statistically significant, this is also
substantively significant
•
Does the coefficient mean that large
increases in X only produce trivial changes
in Y? Then regardless of statistical
significance, the relationship is
substantively uninteresting
•
This is a qualitative judgment based on
your needs, but it takes into account the
numbers


Research hypothesis: The level of economic
development has a positive effect on civil liberties in
countries of the world
Dependent variable: civil liberties
◦ Interval-ratio

Independent variable: GDP per capita (\$1000)
◦ Measure of the level of the economic development
◦ Interval-ratio
61

•
Regression Coefficient (beta) = .257
Substantive significance
• Increase of \$1000 in the level of GDP per capita increases the
civil liberties score by .257.
• On a 5-point scale, this is interesting. On a 1000-point scale it
would not be interesting.

Statistical significance:


p < .001
Statistically significant at the .001 or .1% level
• R square=.525

GDP per capita explains 52.5% of variation in civil liberties
• Research hypothesis: was not falsified by bivariate regression
analysis (i.e. was consistent with the regression)

The level of economic development has a positive and statistically
significant effect on civil liberties
62




Linear regression predicts best near the mean values
of X. Extreme values of X (low or high) are
associated with greater error when predicting Y.
Solution: Confidence intervals. A 95% confidence
interval is where 95% of observations of Y at a given
value of X are expected to fall, given the significance
of the coefficient of X.
Example: Polls with “margins of error” (typically
95% confidence intervals)
Another example:
63
Also known as “time series analysis.”
A. Simplest form: Yt = Yt-1+α

◦ Y is the DV, t is time, and α is a constant
◦ If Yt-1Y is 38 and α is 1, then y will be 101, 102,
103, etc as time passes
◦ Note that this is simply a rearranged linear
regression equation. The DV is predicted by
previous values of the DV (which fill in as the IVs in
the model)



Form: Yt = βYt-1 + α
β is the multiplicative relationship between
Yt-1 and Yt
So if β=1, then Y never changes over time.
◦ If β>1 then Y increases over time
◦ If β<1 then Y diminishes over time
1. Time’s arrow: Since cause must precede effect, time
series analysis can be used to rule out the possibility
that Y causes X
2. Autocorrelation: Sometimes we need to address the
correlation of a variable with itself over time.
Example: to predict defense budget, first thing to
know is that it’s usually similar to last year’s budget.
Then one can add IVs that might cause it to increase
or decrease.
3. Omitted variable bias: Failing to “control” for a
relevant IV (one that may correlate with both X and Y)
can generate “false positives” – statistically significant
relationships between variables that are causally
unrelated (example: high correlation between
Vietnam vets and supermarkets)
A.
Is the relationship causal? Difficult to know for
sure…
1. Possibility of coincidence: Addressed by requiring
models to be statistically significant. Chance remains,
but is low.
2. Sources of bias:
a.
b.
c.
Y causes X. That is, perhaps the researcher has reversed
the DV and IV. Use time-series analysis to rule this out.
Faulty data – But only if the data is biased in some manner
that makes X and Y correlate. Random noise is already
accounted for. Example of bias = serial autocorrelation,
or correlation across time. Many things (kids and dogs)
grow larger over time. But height of your kid does not
cause your dog to get bigger!
Omitted variables – suppose Z causes X and Z causes Y.
Then X and Y will appear to be causally related when in
fact they are merely correlated. Adding Z to the model
would reveal that X has no independent effect on Y.
A.
Is the relationship causal? Difficult to know for
sure…
1. Possibility of coincidence: Addressed by requiring
models to be statistically significant. Chance remains,
but is low.
2. Sources of bias:
a.
b.
c.
Y causes X. That is, perhaps the researcher has reversed
the DV and IV. Use time-series analysis to rule this out.
Faulty data – But only if the data is biased in some manner
that makes X and Y correlate. Random noise is already
accounted for. Example of bias = serial autocorrelation,
or correlation across time. Many things (kids and dogs)
grow larger over time. But height of your kid does not
cause your dog to get bigger!
Omitted variables – suppose Z causes X and Z causes Y.
Then X and Y will appear to be causally related when in
fact they are merely correlated. Adding Z to the model
would reveal that X has no independent effect on Y.
1.
Requires either
a. The ability to forecast the IVs themselves, or
b. A model that forecasts Y(t) from IVs in t-1, t-10,
etc.
2.
Long-term forecasting models are rare.
Why?
1.
2.
3.
4.
5.
6.
Find a linear regression (OLS) that forecasts
something
Find the future values of X
Plug these into the equation
Multiply each X with its corresponding B
(order of operations)
Add it all together. Don’t forget the
intercept.
Presto! You have a forecast for Y!
```