Microsoft PowerPoint (About PEPA)

Report
Programme Evaluation for Policy
Analysis
Mike Brewer, 4 October 2011
www.pepa.ac.uk
PEPA is based at the IFS and
CEMMAP
© Institute for Fiscal Studies
Outline
• Who we are
• Overview and aims
• The 5 projects
• Training and capacity building
© Institute for Fiscal Studies
Who we are: PI and co-Is
• Richard Blundell, UCL & IFS
• Mike Brewer, University of Essex & IFS
• Andrew Chesher, UCL & IFS
• Monica Costa Dias, IFS
• Thomas Crossley, Cambridge & IFS
• Lorraine Dearden, IoE & IFS
• Hamish Low, Cambridge & IFS
• Costas Meghir, Yale & IFS
• Imran Rasul, UCL & IFS
• Adam Rosen, UCL
• Barbara Sianesi, IFS
• DWP is a “partner”
© Institute for Fiscal Studies
Programme Evaluation for Policy Analysis:
overview
• PEPA is about ways to do, and ways to get the most out of,
“programme evaluation”
“government
policies”
(although can
often generalise)
© Institute for Fiscal Studies
“estimating the
casual impact of”
Programme Evaluation for Policy Analysis:
overview
• PEPA is about ways to do, and ways to get the most out of,
“programme evaluation”
• Aims
– To stimulate a step change in the conduct of programme evaluation
in the United Kingdom (and around the world)
– To maximise the value of programme evaluation by improving the
design of evaluations, and improving the way that such evaluations
add to the knowledge base
• Beneficiaries
– those who do programme evaluation
– those who commission, design and make decisions based on the
results of evaluations
– those interested in impact of labour market, education and health
policies
© Institute for Fiscal Studies
More on our aims: three challenges for
programme evaluation
1. We know the outcomes for participants on a training
programme. But what was the counterfactual?
2. Given the counter-factual, we can estimate the programme’s
impact. But how certain are we?
3. Given that the evaluation has been done, how can we get the
most value from it?
– How can we generalize what we learn from this evaluation to other
training programs?
– How should we synthesize the lessons learned from multiple studies
of different training programs?
© Institute for Fiscal Studies
PEPA: overview
PEPA
0. Core programme
evaluation skills
1. Are RCTs worth
it?
Barbara Sianesi,
Jeremy Lise
© Institute for Fiscal Studies
2. Inference
Thomas Crossley,
Mike Brewer,
Marcos Hernandez,
John Ham
3. Control functions
and evidence
synthesis
Richard Blundell,
Adam Rosen,
Monica Costa Dias,
Andrew Chesher
4. Structural
dynamic models
Hamish Low,
Monica Costa Dias,
Costas Meghir
5. Social networks
Imran Rasul,
Marcos Hernandez
1. Making the most of RCTs: reassessing ERA
(Sianesi & Lise)
• The Employment, Retention and Advancement demonstration
(2003-2007)
– first large-scale RCT in social policy in UK (over 16,000
people)
– has been evaluated experimentally (Hendra et al., 2011)
• Aim: maximise the value of the ERA experiment
– Improve the design of non-experimental evaluations
– Improve way such evaluations add to the knowledge base
•
“Gold standard” randomisation is still rare
– costly, impractical or politically infeasible → Project 1A
– lack of external validity and ex ante analysis → Project 1B
© Institute for Fiscal Studies
1a. Lessons for non-experimental methods
(Sianesi)
• Non-experimental evaluation methods have been assessed
against an experimental benchmark in a small number of US
studies in the 1970s and 1980s
• Exploit a recent and UK-based random experiment to learn about
– and possibly improve upon – the performance of nonexperimental methods routinely used in UK evaluations
– pilot-control areas
– individual matching
– difference-in-differences
• The experimental estimates will be compared against the best
alternative that can be devised with the available data
© Institute for Fiscal Studies
1b. A reassessment of the ERA
(Lise)
• Can experimental data be combined with behavioural models of
labour market behaviour to lead to better ex ante evaluations?
• Methodology
– take a typical search and matching model, and calibrate it to match
the data on ERA comparison group
– simulate ERA policy within the model
– check if simulated outcomes match observed data for ERA
participants
• Experimental variation allows testing of theoretical model
• If simulated outcomes match ERA participants’ outcomes, then:
– can use simulations to evaluate ex ante alternative ERA policies
– can see how estimate of policy impact changes once interactions
with wider labour market are taken into account
© Institute for Fiscal Studies
2. Improving inference for policy evaluation
(Crossley, Brewer, Hernandez, Ham)
• Critical to characterise uncertainty of estimates (and thus
perform inference correctly)
• This can be hard when
– data have a multi-level structure, and where there is serial
correlation in the treatment and in group-level shocks
– when the estimated policy impacts are complex and discontinuous
functions of estimated parameters
• Similarly, can be hard to perform power calculations in all but
simplest RCT
• Aims
– Review, disseminate and (hopefully) develop techniques
– Provide resources
– Substantive applications: impact of labour market or welfare-to-work
programmes
© Institute for Fiscal Studies
2a. Inference and power in Diff-in-Diff
(Crossley, Brewer, Hernandez)
• A common evaluation technique is to use diff-in-diff over areas
and time
• Serially-correlated errors and group-level structure of data mean
naïve inference often incorrect (standard errors “too small”;
Bertrand et al. 2004)
– But most solutions work only for “large” number of groups, and
literature evolving much faster than practice
• Aims
– Demonstrate the problems for inference caused by seriallycorrelated and multi-level data, and the practicality and relevance of
a range of suggested solutions, providing resources where
appropriate
– Develop new tools for inference
• randomisation/permutation tests
• serial correlation in the non-linear DiD
© Institute for Fiscal Studies
2a. Inference and power in Diff-in-Diff
(Crossley, Brewer, Hernandez)
• Flip side to inference is a power calculation
• Will produce resources to carry out power calculations for nonexperimental designs.
– difference-in-differences
– instrumental variables
– regression discontinuity
• Power calculations will reflect:
– Cluster effects: observations from different agents are not
independent from each other
– Monte Carlo methods to deal with a reduced number of clusters
– Different patterns of time-series correlation
© Institute for Fiscal Studies
2b. Inference in duration analysis
(Brewer, Ham)
• Duration/survivor or transition models are natural tools for
programme evaluation when outcomes of interest are spells or
transitions
– Estimated policy impacts often complex, discontinuous functions of
the estimated parameters of a statistical model
• Will establish how best to use event history models to provide
policy-makers with
– estimates of the impact of a policy on the hazard rate
– expected time spent in various states
– correct confidence intervals around these both
• Will build on Eberwein, Ham and Lalonde (2002), Ham and
Woutersen (2009) and Ham, Li and Sheppard (2010)
© Institute for Fiscal Studies
3. Control functions in policy evaluation
(Blundell, Costa Dias, Rosen, Chesher,
Kitagawa)
• Choice among alternative evaluation methods is driven by three
concerns
– Question to be answered
– Type and quality of data available
– Assignment rule (the mechanism that allocates individuals to
the programme)
• This project focuses on the last
• Idea
– The ideal assignment rule comes from an RCT
– But if we know something about the assignment rule, then the
control function approach allows us to account for/correct for the
endogenous selection into treatment
3. The control function approach: example
• Interested in the impact of university education on subsequent
labour market earnings (the “returns to university education”)
• Unobservable determinants of earnings, e.g. underlying ability,
will be correlated with the decision to attend university, so a
simple regression will provide a biased view of the returns to
university
• By modelling key features of the decision to attend university –
the “assignment rule” to university – the control function
approach can correctly recover the average return to university
among those who took up a place
3. The control function approach: example
(continued)
• These key features will ideally be factors that determine
assignment to university but do not determine directly final
earnings in the labour market
– Family socio-economic background, level of university fees,
distance to university, availability of university places (if rationed)
• If can write down an equation modelling the way these factors
determine university attendance, we can construct an index (or
‘control function’) that can then be included in the earnings
regression along with the indicator for attending university.
– Extension of the ‘Heckman’ selection approach that controls for the
endogenous selection into treatment
3. The control function approach: our research
• Research questions:
– Under what circumstances does the use of a control function
compare favourably to matching and instrumental variables? What
are the key trade-offs?
– How does a control function approach map into a behavioural
model? What can a control function approach tell us about structural
parameters of interest?
– Can we weaken the control function approach by incorporating
partial knowledge of the assignment rule to produce bounds?
– Will study various education and labour market policies
4. Dynamic behavioural models for policy
evaluation (Low, Dias, Shaw, Meghir, Pistaferri)
• Classical ex post empirical evaluation methods often
fail to explain the nature of the estimated effect
– Cannot disentangle impact of programme on incentives from
how incentives affect individual decisions
– Cannot account for dynamic responses (anticipation or
changes now affect decisions in future)
– Studies often rely on different sets of behavioural
assumptions
• Difficult to understand, as not explicitly stated
• Complicates task of synthetising information from different studies
– Cannot be used for counterfactual analysis
• Results are specific to the policy, time and environment
4. Dynamic behavioural models for policy
evaluation
• Aim: to address these weaknesses using a structural
(dynamic behavioural) approach
– Explicitly formalises incentives and decisions
– But relies on heavy set of (explicit) behavioural assumptions
• Will study ways to make minimal and transparent
assumptions
– Use quasi-experimental data to estimate and validate models
of behaviour
– Explore the use of optimality conditions - independent of the
full structure of the model - to estimate some parameters
– Use robust estimates of bounds on treatment effects to
bound structural parameters
© Institute for Fiscal Studies
4. Dynamic behavioural models for policy
evaluation: applications
• Impact of welfare time-limits
– Develop dynamic model to study how time-limits in welfare
eligibility may affect claiming decisions at different stages of life
– Use the US programme, “Targeted Help to Needy Families”, as
the empirical application
– Our model will replicate, and then generalise, previous empirical
results
• Impact of welfare-to-work on education
– Use structural behavioural model of education and labour supply
choices to evaluate how future welfare-to-work programmes
affects the ex ante value of education
– Use evaluation studies to validate the behavioural assumptions
– Use partial identification to provide bounds for structural
parameters
© Institute for Fiscal Studies
5. Social networks and program evaluation
(Rasul, Fitzsimons, Hernandez, Malde)
• To understand individuals’ or households’s behaviour,
must recognize that individuals are embedded within
social networks
• In developing countries, networks play various roles:
– substitute for missing markets
– key source of insurance and other resources to their members
• Will seek to understand how networks interplay with
policy interventions
• Will combine developments in theories of network
formation and behavior within networks with empirical
methods for program evaluation with social
interactions
© Institute for Fiscal Studies
5. Social networks and program evaluation:
example of Progresa
• Progresa is village-level intervention in rural Mexico. Previous
research has shown that:
– 1 in 5 households are “isolated” (none of their extended
family resides within the same village)
– On some margins, only non-isolated households responded
to Progresa
• Was it because poor families needed assistance and
encouragement to join the programme?
• Or was it because of nature of Progresa intervention, part
of which was to encourage teenage girls to stay in school?
5. Social networks and program evaluation
• Substantive research questions
– How are the benefits of program interventions dissipated within
communities once social networks are accounted for?
– How do such spillovers (from beneficiary to non-beneficiary
households) affect the cost-benefit analysis of programs, and how
we think about targeting?
– Why and how are social networks formed (can investigate this by
studying particular interventions)
• Methodological research questions
– How best to measuring whether and how households are socially
tied (blood ties , resource flows)?
PEPA: research questions
PEPA
1. Are RCTs worth it?
2. Inference
3. Control functions and
evidence synthesis
4. Structural dynamic
models
5. Social networks
Can non-experimental
methods replicate the
results of RCTs?
Correct inference and
power calculations
where data have multilevel structure & seriallycorrelated shocks?
Can we weaken control
function approach to
estimate bounds?
How best to use ex post
evaluations in ex ante
analysis?
How best to collect data
on social networks?
How can we combine
results from RCTs with
models of labour market
behaviour?
Correct inference when
policy impacts are
complex functions of
estimated parameters?
Link between control
function and structural or
behavioural model s?
How are education
decisions affected by
welfare-to-work
programmes?
How is impact of policy
affected by the social
networks within and
between treated and
control groups?
How do GE effects alter
estimated impact of
training programmes?
Impact of time-limited inwork benefits on job
retention?
How are lessons from
multiple evaluations best
synthesised
How do life-cycle time
limits on welfare receipt
affect behaviour?
Can social networks
explain heterogeneity in
impact of a health
intervention?
© Institute for Fiscal Studies
Training and capacity building
• Mixture of courses, masterclasses, workshops and resources
(how-to manuals, software)
• All projects have their own TCB programme
• Plus core TCB offering in general programme evaluation skills
– 4 “standard” courses/year and 1 “advanced” course/year
– 1 course/year for those designing or commissioning evaluations
© Institute for Fiscal Studies
PEPA: training and capacity building
PEPA
0. Core programme
evaluation skills
1. Are RCTs worth it?
2. Inference
3. Control functions and
evidence synthesis
4. Structural dynamic
models
Core course in evaluation
methods
Course on estimating
“search models”
Course, manual and
software tools on power
calculations
Course and workshop n
control functions in policy
evaluation
Courses, manual and
software tools on building
dynamic behavioural
models
Courses in programme
evaluation for designers
and users of evaluations
Workshop on using
“search models”
Workshop, course,
methods survey, manual
and software tools on
correct inference in
evaluations
Workshop on evidence
synthesis
Workshop on dynamic
behavioural models and
policy evaluation
How-to guide for PS
matching
Workshop on value of
RCTs
Workshop and courses on
using survivor models for
policy evaluation
Course on bounds in
policy evaluation
© Institute for Fiscal Studies
5. Social networks
Methods survey, course
and workshop on
collecting and using data
on social networks
PEPA management and administration team
• Director
– Now until October 2012: Mike Brewer
– April 2012 thereafter: Lorraine Dearden
• Co-director: Monica Costa Dias
• Administrator: Kylie Groves
• IT: Andrew Reynolds
• DWP are partner organisation, with hope that this eases access
to their data. In practice, very reliant on key contact (Mike Daly)
© Institute for Fiscal Studies

similar documents