Descriptive Statistics and Linear Regression

```Topics in Microeconometrics
William Greene
Department of Economics
Stern School of Business
Descriptive Statistics and Linear Regression
Model Building in Econometrics
•
Parameterizing the model
•
•
•
•
Nonparametric analysis
Semiparametric analysis
Parametric analysis
Sharpness of inferences
follows from the strength of
the assumptions
A Model Relating (Log)Wage
to Gender and Experience
Application: Is there a relationship
between investment and capital stock?
Nonparametric Regression
Kernel regression of y on x
Semiparametric Regression: Least
absolute deviations regression of y on x
Parametric Regression: Least squares –
maximum likelihood – regression of y on x
Cornwell and Rupert Panel Data
Cornwell and Rupert Returns to Schooling Data, 595 Individuals, 7 Years
Variables in the file are
EXP
WKS
OCC
IND
SOUTH
SMSA
MS
FEM
UNION
ED
BLK
LWAGE
=
=
=
=
=
=
=
=
=
=
=
=
work experience
weeks worked
occupation, 1 if blue collar,
1 if manufacturing industry
1 if resides in south
1 if resides in a city (SMSA)
1 if married
1 if female
1 if wage set by union contract
years of education
1 if individual is black
log of wage = dependent variable in regressions
These data were analyzed in Cornwell, C. and Rupert, P., "Efficient Estimation with Panel
Data: An Empirical Comparison of Instrumental Variable Estimators," Journal of Applied
Econometrics, 3, 1988, pp. 149-155. See Baltagi, page 122 for further analysis. The
data were downloaded from the website for Baltagi's text.
A First Look at the Data
Descriptive Statistics
•
•
Basic Measures of Location and Dispersion
Graphical Devices
•
•
Histogram
Kernel Density Estimator
Histogram for LWAGE
The kernel density
estimator is a
histogram (of sorts).
 x i  x m* 
n 1
*
ˆf ( x * )  1
K
 , fo r a se t o f p o in ts x m

m
i1
n
B  B

B  " b a n d w id th " ch o se n b y th e a n a lyst
K  th e k e rn e l fu n ctio n , su ch a s th e n o rm a l
o r lo g istic p d f (o r o n e o f se v e ra l o th e rs)
x*  th e p o in t a t w h ich th e d e n sity is a p p ro xim a te d .
T h is is e sse n tia lly a h isto g ra m w ith sm a ll b in s.
Kernel Estimator for LWAGE
Kernel Density Estimator
T h e cu rse o f d im e n sio n a lity
1
n 1
*
fˆ ( x m )   i  1 K
n
B
 x i  x m*

 B

*
 , fo r a se t o f p o in ts x m

B  " b a n d w id th "
K  th e k e rn e l fu n ctio n
x*  th e p o in t a t w h ich th e d e n sity is a p p ro xim a te d .
fˆ ( x* ) is a n e stim a to r o f f(x* )
1

n
n
i1
Q ( x i | x* )  Q ( x* ).
B u t, V a r[Q ( x* )] 
1
1
 S o m e th in g . R a th e r, V a r[Q ( x* )] 
N
N
3 /5
* so m e th in g
I.e., fˆ ( x* ) d o e s n o t co n v e rg e to f ( x* ) a t th e sa m e ra te a s a m e a n
co n v e rg e s to a p o p u la tio n m e a n .
Objective: Impact of Education
on (log) wage
•
•
•
•
Specification: What is the right model
to use to analyze this association?
Estimation
Inference
Analysis
Simple Linear Regression
LWAGE = 5.8388 + 0.0652*ED
Multiple Regression
Specification: Quadratic Effect of Experience
Partial Effects
Education:
Experience:
FEM
.05544
.04062 – 2*.00068*Exp
– .37522
Model Implication: Effect of
Experience and Male vs. Female
Hypothesis Test About Coefficients
•
Hypothesis
•
•
•
Null: Restriction on β: Rβ – q = 0
Alternative: Not the null
Approaches
•
•
Fitting Criterion: R2 decrease under the null?
Wald: Rb – q close to 0 under the alternative?
Hypotheses
All Coefficients = 0?
R = [ 0 | I ] q = [0]
ED Coefficient = 0?
R = 0,1,0,0,0,0,0,0,0,0,0,0
q= 0
No Experience effect?
R = 0,0,1,0,0,0,0,0,0,0,0,0
0,0,0,1,0,0,0,0,0,0,0,0
q=0
0
Hypothesis Test Statistics
S u b s c rip t 0 = th e m o d e l u n d e r th e n u ll h yp o th e s is
S u b s c rip t 1 = th e m o d e l u n d e r th e a lte rn a tiv e h yp o th e s is
1 . B a s e d o n th e F ittin g C rite rio n R
2
F=
2
(R 1 - R 0 ) / J
(1 -
2
R1
2
) / (N - K 1 )
= F [J,N - K 1 ]
2 . B a s e d o n th e W a ld D is ta n c e : N o te , fo r lin e a r m o d e ls , W = J F .


-1
2
-1
C h i S q u a re d = ( R b - q )  R s ( X 1 X 1 ) R   ( R b - q )


Hypothesis: All Coefficients Equal Zero
All Coefficients = 0?
R = [0 | I] q = [0]
R12 = .42645
R02 = .00000
F
= 280.7 with [11,4153]
Wald = b2-12[V2-12]-1b2-12
= 3087.83355
Note that Wald = JF
= 11(280.7)
Hypothesis: Education Effect = 0
ED Coefficient = 0?
R = 0,1,0,0,0,0,0,0,0,0,0,0
q= 0
R12 = .42645
R02 = .36355 (not shown)
F
= 455.396
Wald = (.05544-0)2/(.0026)2
= 455.396
Note F = t2 and Wald = F
For a single hypothesis
about 1 coefficient.
Hypothesis: Experience Effect = 0
No Experience effect?
R = 0,0,1,0,0,0,0,0,0,0,0,0
0,0,0,1,0,0,0,0,0,0,0,0
q= 0
0
R02 = .34101, R12 = .42645
F = 309.33
Wald = 618.601 (W* = 5.99)
A Robust Covariance Matrix
T h e W h ite E stim a to r
-1
E st.V a r[b ] = ( X X ) 

•
•
i
Heteroscedasticty
Not robust to:
•
•
•
•
2
What does robustness mean?
Robust to:
•
•

e i x i x i  ( X X )

Autocorrelation
Individual heterogeneity
The wrong model specification
‘Robust inference’
-1
Robust Covariance Matrix
Heteroscedasticity Robust
Covariance Matrix
```