Fixed effects model

Panel Data Analysis
Stefan Trappl
Constanze Fay
Schularick, Taylor (2012)
• Is credit growth predicting financial crisis?
• Analysis of macroeconomic panel data for developed countries 1870-2008. Two
time periods: pre- and post-WW2, 79 major banking crisis in 14 countries.
• Dependent variable: “financial crisis” from Bordo et al. (2001) and Reinhart,
Rogoff (2009); independent variables: lagged credit and money supply, loans and
bank assets, inflation, investment, GDP
Trappl, Fay
Our approach:
• Is income-inequality predicting financial crisis?
• We use the dataset of Schularick/Taylor, but use only a reduced dataset (8
countries) because of the limited availability of income-inequality-data
• Dependent binary variable: “financial crisis (0/1)”; independent variables: lagged
credit and money supply, loans, investment, personal income inequality
(measured by the „Top1%-Income-Share“)
• Dataset by Thomas Piketty: Capital in the 21st Century
Trappl, Fay
Schularick Taylor - Model
• Logistic regression estimating the probability of a crisis based on previous periods
credit growth
Probability of crisis
Lagged credit growth
Control variable
• OLS and Logit models with country and year fixed effects
Trappl, Fay
Our Model:
• Generalized Linear Mixed Effects Regression estimating the probability of a crisis
based on Income-Inequality in the previous periods
Probability of crisis
Fixed Effects & Random Effects Terms
Error term
• GLMM model; country = group
Trappl, Fay
XLConnect package
• Java-based; used for importing Excel sheets, reading and writing Excel worksheets
from within R
• Alternative: RODBC package only available in 32bit R version (switch to 32 from 64bit
in „Tools/Global Options“ in Rstudio)
• There is a possibility to workaround the „incomplete final line“ error when using
read.table to create data.frames from Excel or .csv files in R when using the JGR
console (File/Load data)
Load Excel Sheets in R via either loadWorkbook or
readWorksheetfromFile functions; Always
save workbook for your commands to be done!
Trappl, Fay
Panel data analysis
Packages in R
• Paneldata: linear models for panel data
• pdR: panel data regression
• Pglm: panel generalized linear model
• Phtt: panel data analysis with heterogenuous time trends
• plm: linear models for panel data
• lme4, nlme: maximum likelihood estimation with panels
OLS does not consider heterogeneity across
units or time
Data preparation
The first two columns in
panel data have to be (1)
the unit and (2) the time
period (most granular
• The pdata.frame function in plm prepares data frames
for panel data analysis. An „index“ variable indicates
which columns to recognize as unit and time variable.
Default value („NULL“) assumes observations to be
listed by individual (column 1) and then time (column
2) or add a number indicating the n° of units in a
balanced panel or add a character string indicating the
individual or time column; e.g. c(„state“,“year“)
Models in the plm package
• The individual heterogeneity across units is captured by two error components, one individual which does
not change over time and one idiosyncratic assumed to be well behaved and iid.
Errors uncorrelated with
regressors and white noise?
OLS Pooling model
Random effects model („random“)
Errors uncorrelated with
Fixed effects model („within“)
Trappl, Fay
First Differencing model (errors persistent)
Models in plm
plm model objects are the result of demeaned data; individual effects time-demeaned: fe, „within“,
quasi-time demeaned for the random effects model and no-demeaning for pooling /OLS
Types of models
• Pooling model (“pooling”): OLS, panel data is pooled, time series
component is not considered
• Fixed effects model (“within”, dummy variables): based on the
deviation of the individual means
• Fist-differences model (“fd”, lagged model): removes timeinvariant individual error components by first-differencing;
preferred whenindividual error component is persistent over
• Random effects model (“random”): individual error term
component uncorrelated with the regressors; more efficient than
fixed effects
• „Between“ model is based on time (group) averages per unit
which discards intragroup variability but is apt for non stationary
data; used for estimating long run relationships
• Variable coefficient models assume that coefficients vary around
an average
• FGLS is used when errors are heteroscedastic and autocorrelated,
in case of fixed effects also fixed effects FGLS;
• plm: within, between and random effects models
• pvcm: models with variable coefficients
• Pggls: FGLS
• Pgmm: GMM
Function (formula, data, index, effect, model)
• Effects: individual or time effects; if there are time effects
use gls function in lme package (john fox appendix time
series regression)
Which model to chose?
The F-test compares the
model for the full sample
with a model based on
an equation for each
„Poolability“ test with H0 implying that OLS is the apt model, there are no fixed effects, units are sufficiently
homogeneous and coefficients are the same for all units
Pooltest(plm, pvcm model=„within“)or pFtest:
A significant F-statistics leads to If a rejection of the H0 implying
that there are fixed effects.
Test for individual or time effects:
plmtest (plm,type,effect)type: Lagrange multiplier
tests („bp“, „honda“, „kw“, „ghm“), effect: individual, time and
Test to chose between fixed or random effects models with Hausman-type test comparing estimators under the
null of no significant difference between the two models; random model more efficient
Assume random effects if n is large
relative to t so that individual
effects can be viewed as random
phtest(plm „within“, plm „random“)
Which model to choose?
test for serial correlation of the error term: fixed effects always cause serial correlation, in addition there may be usual AR(1) correlation
of the idiosyncratic error term -> as these tests have power against each other, joint tests are needed which, however, do not give
information on the reason for rejection! There are several joint, marginal and conditional tests in plm; problem is if errors are not
normal and homoscedastic
In short panels with a large number of observations serial correlation is not a
problem as due to the large number of observations error correlations appear as
random. Not so in long time series macro models.
+ further diagnostics + screening tests; dynamic models and when lack of exogeneity of regressors: GMM
Trappl, Fay
panel analysis functions
"pdata.frame" "pdim"
• plm : function (formula, data, subset, na.action, effect =
c("individual", "time", "twoways"), model = c("within", "random",
"ht", "between", "pooling", "fd"), random.method = c("swar",
"walhus", "amemiya", "nerlove", "kinla"), inst.method = c("bvk",
"baltagi"), restrict.matrix = NULL, restrict.rhs = NULL, index =
NULL, ...)
• pdata.frame : function (x, index = NULL, drop.index = FALSE,
row.names = TRUE)
• Explorative data analysis: use „|“ to consider both unit and year
dimensions in scatterplot function of car package
Literature: Croissant, Y., Millo, G.: Panel Data Econometrics in R. The plm package.
• Schularick, Moritz, and Alan M. Taylor. 2012. "Credit Booms Gone Bust: Monetary Policy, Leverage Cycles,
and Financial Crises, 1870-2008." American Economic Review, 102(2): 1029-61.
Trappl, Fay

similar documents