### SW 6 part 2

```Multiple Regression (SW Ch. 6)
1.
2.
3.
4.
5.
6.
Omitted variable bias
Causality and regression analysis
Multiple regression and OLS
Measures of fit
Sampling distribution of the OLS estimator
Multicollinearity
1
The Least Squares Assumptions
2
LSA #1: E(u|X = x) = 0
3
LSA #2: (Xi,Yi), i = 1,…,n are i.i.d.
LSA #3: E(X4) < ∞ and E(Y4) < ∞
4
Sampling Distribution of bˆ1
5
Measures of Fit
6
Measures of Fit
7
Measures of Fit: example
8
Measures of Fit
• Akaike’s Information Criterion (AIC) is an alternative
method for adjusting the residual sum of squares for the
sample size (n) and number of covariates (k)
• Is the improved fit “worth” it?
æ SSR ö 2(k + 1)
AIC = lnç
÷+
è n ø
n
9
Example: caschool.dta
. reg testscr str, rob
Linear regression
Number of obs
F( 1,
418)
Prob > F
R-squared
Root MSE
=
=
=
=
=
420
19.26
0.0000
0.0512
18.581
-----------------------------------------------------------------------------|
Robust
testscr |
Coef.
Std. Err.
t
P>|t|
[95% Conf. Interval]
-------------+---------------------------------------------------------------str | -2.279808
.5194892
-4.39
0.000
-3.300945
-1.258671
_cons |
698.933
10.36436
67.44
0.000
678.5602
719.3057
-----------------------------------------------------------------------------. estat ic
----------------------------------------------------------------------------Model |
Obs
ll(null)
ll(model)
df
AIC
BIC
-------------+--------------------------------------------------------------. |
420
-1833.296
-1822.25
2
3648.499
3656.58
-----------------------------------------------------------------------------
10
Example: caschool.dta
. reg testscr str el_pct, rob
Linear regression
Number of obs
F( 2,
417)
Prob > F
R-squared
Root MSE
=
=
=
=
=
420
223.82
0.0000
0.4264
14.464
-----------------------------------------------------------------------------|
Robust
testscr |
Coef.
Std. Err.
t
P>|t|
[95% Conf. Interval]
-------------+---------------------------------------------------------------str | -1.101296
.4328472
-2.54
0.011
-1.95213
-.2504616
el_pct | -.6497768
.0310318
-20.94
0.000
-.710775
-.5887786
_cons |
686.0322
8.728224
78.60
0.000
668.8754
703.189
-----------------------------------------------------------------------------. estat ic
----------------------------------------------------------------------------Model |
Obs
ll(null)
ll(model)
df
AIC
BIC
-------------+--------------------------------------------------------------. |
420
-1833.296
-1716.561
3
3439.123
3451.243
-----------------------------------------------------------------------------
11
The Least Squares Assumptions for
Multiple Regression
12
13
14
“.” treated as +∞ in STATA
. gen incq1 = 1 if avginc <10.639
(314 missing values generated)
. replace incq1 = 0 if avginc>=10.639 & avginc < .
. gen incq2 = 1 if avginc < 13.727 & avginc >=10.639
(316 missing values generated)
. replace incq2 = 0 if avginc < 10.639 & avginc >= 13.727 & avginc < .
. replace incq2 = 0 if avginc < 10.639 | (avginc >= 13.727 & avginc < .)
. gen incq3 = 1 if avginc < 17.638 & avginc >=13.727
(315 missing values generated)
. replace incq3 = 0 if avginc < 13.727 | (avginc >= 17.638 & avginc < .)
. gen incq4 = 1 if avginc >= 17.638 & avginc < .
(315 missing values generated)
. replace incq4 = 0 if avginc < 17.638
. gen testdum = incq1 + incq2 + incq3 + incq4
. sum avginc inc* testdum
Variable |
Obs
Mean
Std. Dev.
Min
Max
-------------+-------------------------------------------------------avginc |
420
15.31659
7.22589
5.335
55.328
incq1 |
420
.252381
.4348967
0
1
incq2 |
420
.247619
.4321441
0
1
incq3 |
420
.25
.4335291
0
1
incq4 |
420
.25
.4335291
0
1
-------------+-------------------------------------------------------testdum |
420
1
0
1
1
15
Dummy Variable Trap
. reg testscr str incq1 incq2 incq3 incq4, robust
note: incq3 omitted because of collinearity
Linear regression
Number of obs
F( 4,
415)
Prob > F
R-squared
Root MSE
=
=
=
=
=
420
72.03
0.0000
0.4468
14.24
-----------------------------------------------------------------------------|
Robust
testscr |
Coef.
Std. Err.
t
P>|t|
[95% Conf. Interval]
-------------+---------------------------------------------------------------str | -1.417963
.400663
-3.54
0.000
-2.205545
-.6303814
incq1 | -16.97711
1.953708
-8.69
0.000
-20.81751
-13.13672
incq2 | -6.795768
1.83231
-3.71
0.000
-10.39753
-3.194003
incq3 | (omitted)
incq4 |
16.17749
1.880508
8.60
0.000
12.48098
19.87399
_cons |
683.929
8.136528
84.06
0.000
667.9351
699.923
------------------------------------------------------------------------------
• Solution #1 is to …
• Interpretation is then …
16
Dummy Variable Trap
. reg testscr str incq1 incq2 incq3 incq4, robust noconstant
Linear regression
Number of obs
F( 5,
415)
Prob > F
R-squared
Root MSE
=
=
=
=
=
420
.
0.0000
0.9995
14.24
-----------------------------------------------------------------------------|
Robust
testscr |
Coef.
Std. Err.
t
P>|t|
[95% Conf. Interval]
-------------+---------------------------------------------------------------str | -1.417963
.400663
-3.54
0.000
-2.205545
-.6303814
incq1 |
666.9519
7.862759
84.82
0.000
651.4961
682.4077
incq2 |
677.1333
7.931178
85.38
0.000
661.543
692.7236
incq3 |
683.929
8.136528
84.06
0.000
667.9351
699.923
incq4 |
700.1065
8.014253
87.36
0.000
684.3529
715.8601
------------------------------------------------------------------------------
• Solution #2 is to …
• Interpretation is then …
17
The Sampling Distribution of the
OLS Estimator in Multiple Reg
18
Imperfect Multicollinearity
ù s2
1é 1
ú 2u
sb = ê
2
1
n êë1- r X1 , X 2 úû s X1
2ˆ
19
Detection and Remedies for
Imperfect Multicollinearity
• Detection
 calculate all the pairwise correlation coefficients
 > .7 or .8 is some cause for concern
 Variance Inflation Factors (VIFs) can be calculated
 Hallmark is high R2 but insignificant t-statistics
 Remedy
 Do nothing
 Drop a variable
 Transform multicollinear variables


need to have same sign and magnitudes
Get more data (i.e., increase the sample size)
20
```