Good Countries or Good Projects?

Report
Good Countries or Good Projects?
Micro and Macro Correlates of World Bank
Project Performance
Cevdet Denizer (Bosphorus University)
Daniel Kaufmann (Revenue Watch and Brookings Institution)
Aart Kraay (World Bank)
IEG Evaluation Week Presentation
March 18, 2013
Motivation
• Huge literature on aid effectiveness at two levels:
– “macro” level – e.g. does total aid raise aggregate
GDP growth?
– “micro” level – e.g. evaluations (randomized or
otherwise) of individual projects
• Know much less about the relative importance of
project-specific versus country-specific factors in
determining project outcomes
– “macro” literature uninformative about individual
projects
– “micro” literature (mostly) does not have crosscountry dimension
This Paper
• Uses very large sample of 6000+ World Bank projects
since 1980s
– crude, but credible, outcome measure for each
project based on internal evaluation processes
(IEG project success ratings)
• Match these up with two types of potential
correlates of project success:
– “macro” country-level variables (easy...)
– “micro” project-level variables (hard...but
interesting)
Preview of Main Results
• Project-level outcomes vary much more within countries
than between countries
• Limited cross-country average variation in project
performance is well-explained by standard “macro”
variables
• Look at variety of “micro” project-level correlates of
project-level outcomes
– basic project characteristics
– early-warning indicators
– identity of task team leader
– much more to be done here since this is where most of
the action is!
Many Potential Concerns
with Outcome Measure
• very crude (sat/unsat, or 6 point scale after 1995)
– definitely not randomized evaluations!
• projects assessed relative to development objective
only, these are not standardized across projects
– different standards for DOs across different
sectors?
• include sector dummies
– evolving standards for setting DOs and evaluating
them?
• include sector x approval period dummies
• include sector x evaluation period dummies
– “setting bar low” in difficult countries?
Potential Concerns, Cont’d
• significant self-reporting component
– incentives of task managers to give poor ratings?
• independence of IEG?
• many steps from effective individual World Bank
projects to any macro growth effects of aid
Despite these concerns, these ratings seem broadly
credible and have advantage of huge country-yearproject coverage
Setup of Empirical Results
• Start with universe of 7342 completed projects
evaluated since 1983, and construct two subsets
based on (i) availability of RHS variables and (ii) units
of evaluation ratings
– 6569 projects evaluated 1983-2011 (binary
outcome variable)
– 4191 projects evaluated 1995-2011 (6-point
outcome variable)
• All specifications control for:
– potential mean differences across three types of
evaluations
– evaluation lag (time between evaluation and
completion), usually significantly negative
“Macro” Correlates of Project Outcomes
• “Standard” set of country-level variables from
literature
– Good policy (CPIA)
– Shocks (GDP growth)
– Democracy (Freedom House)
• Average each one over life of project
– non-trivial decision how to do this, because
projects last a long time (median=6 years)
• alternatives might be initial? final? weighting?
separately by year of project life?
Results: “Macro” Correlates
All Projects
1983-2011
1995-2011
Sat/Unsat
1-6 Rating
Dependent Variable Is:
AFR Sample
1983-2011
1995-2011
Sat/Unsat
1-6 Rating
Real GDP Per Capita Growth
1.915***
(8.53)
4.839***
(6.36)
2.316***
(4.96)
5.892***
(3.51)
CPIA Rating
0.118***
(9.70)
0.533***
(10.81)
0.118***
(5.11)
0.488***
(4.74)
Freedom House Rating
0.00434
(0.99)
0.0143
(0.88)
-0.00653
(-0.59)
-0.00469
(-0.11)
Number of Observations
R-Squared
6569
0.122
4191
0.143
1936
0.165
1172
0.173
Sector Dummies
Sector x Evaluation Period Dummies
Sector x Approval Period Dummies
Estimation Method
Y
Y
Y
OLS
Y
Y
Y
OLS
Y
Y
Y
OLS
Y
Y
Y
OLS
Results: “Macro” Correlates
• Generally sensible results in full sample
– policies/institutions matter a lot
• validation of CPIA in PBA
– growth matters
– no strong evidence that political rights/civil
liberties matter
Results: From “Macro” to “Micro” Correlates
• Country-level variables by construction will explain only
country-level average variation in project outcomes
• But, country-level average variation in project outcomes
is only 20% of the total variation in project outcomes
– based on regression of project outcomes on country
dummies, by year – average R-squared is about 0.2
– “macro” correlates explain this 20% reasonably well
• Points to importance of considering project-level factors
(which we do next)
1
2
3
4
5
6
Project Outcome Ratings and Country
Performance
1
2
3
4
Average CPIA Score over Life of Project
IEG Rating
Fitted Values
5
“Micro” Correlates of Project Outcomes, 1
• dummy for investment lending (vs DPLs, SALs, etc)
• three proxies for complexity
– “concentration” of project in its major sector
– dummy for “repeater” projects, e.g. Botswana
Education II, III are repeats, Education I is not
– ln(size in dollars)
• project length (years from approval to evaluation)
• preparation and supervision costs as share of total
project size
Results: Basic Project Characteristics
Dependent Variable Is:
All Projects
1983-2011
1995-2011
Sat/Unsat
1-6 Rating
AFR Projects
1983-2011
1995-2011
Sat/Unsat
1-6 Rating
Dummy for Investment Projects
0.0489*
(1.73)
0.0771
(0.81)
0.0603
(1.08)
0.430**
(2.41)
Share of Project in Largest Sector
-0.00111***
(-3.31)
-0.00305***
(-3.28)
-0.00145**
(-2.06)
-0.00431**
(-2.25)
Dummy for Repeater Projects
0.00323
(0.25)
-0.0126
(-0.27)
-0.0235
(-0.85)
-0.0127
(-0.13)
Log(Total Project Size)
-0.0486***
(-4.46)
-0.136***
(-3.72)
-0.0673***
(-3.22)
-0.0777
(-1.13)
Project length (years)
-0.00523
(-1.12)
-0.0307**
(-2.11)
-0.0135
(-1.46)
-0.0534*
(-1.81)
Log(Preparation Costs/Total Size)
-0.00664
(-0.83)
-0.0419
(-1.46)
-0.0114
(-0.64)
-0.00414
(-0.08)
Log(Supervision Costs/Total Size)
-0.0479***
(-4.55)
-0.137***
(-3.93)
-0.0628***
(-3.20)
-0.148**
(-2.38)
Number of Observations
R-Squared
6569
0.130
4191
0.156
1936
0.177
1172
0.188
Sector Dummies
Sector x Evaluation Period Dummies
Sector x Approval Period Dummies
Estimation Method
Y
Y
Y
OLS
Y
Y
Y
OLS
Y
Y
Y
OLS
Y
Y
Y
OLS
Results: Basic Project Characteristics
• Investment projects do slightly better
• Mixed results on complexity
– projects more concentrated in one sector do worse??
– “repeater” projects don’t do better?
– larger projects do worse
• Length, preparation (and especially supervision) costs
negatively correlated with outcomes
– big-time endogeneity problem – e.g. “difficult” projects
require more preparation, supervision, take longer
– more on this later (and in paper)
“Micro” Correlates of Project Outcomes, 2
• Effectiveness delay (time in quarters from approval to
first disbursement)
• “Early-warning” indicators of problem projects from endof-FY Implementation Status Review (ISR) Reports for
each year project is active
• “problem project” flag – raised if task manager
thinks progress towards development objective is
unsatisfactory
• “potential problem” flag – raised if three or more
of 12 detailed flags are raised
• dummy for restructuring (very rare)
– dummy=1 if these flags observed in first half of project
(only for projects lasting at least four years)
Results: Early Warning Indicators
Dependent Variable Is:
All Projects
1983-2011
1995-2011
Sat/Unsat
1-6 Rating
AFR Projects
1983-2011
1995-2011
Sat/Unsat
1-6 Rating
Time from Approval to
First Disbursement (quarters)
0.00237
(1.49)
0.0110**
(2.23)
0.00232
(0.75)
0.0135
(1.55)
Dummy for Restructuring
During First Half of Project
0.0978*
(1.94)
0.355***
(2.89)
0.269**
(2.52)
0.786***
(3.34)
Dummy for Problem Project Flag
During First Half of Project
-0.141***
(-7.19)
-0.374***
(-6.33)
-0.109***
(-2.90)
-0.198*
(-1.83)
Dummy for Potential Problem Flag
During First Half of Project
-0.0381
(-1.54)
-0.100
(-1.48)
-0.0824*
(-1.86)
-0.213*
(-1.77)
Number of Observations
R-Squared
3764
0.156
2682
0.181
1082
0.200
785
0.230
Sector Dummies
Sector x Evaluation Period Dummies
Sector x Approval Period Dummies
Estimation Method
Y
Y
Y
OLS
Y
Y
Y
OLS
Y
Y
Y
OLS
Y
Y
Y
OLS
Approval
3764 Projects
First Half of
Implementation
Second Half of
Implementation
Evaluation
Approval
First Half of
Implementation
943 Problem
Projects 25%
3764 Projects
2821 Good
Projects 75%
Second Half of
Implementation
Evaluation
Approval
First Half of
Implementation
Second Half of
Implementation
592 Problem
Projects 63%
943 Problem
Projects 25%
351 Good
Projects 37%
3764 Projects
2821 Good
Projects 75%
853 Problem
Projects 30%
1968 Good
Projects 70%
Evaluation
Approval
First Half of
Implementation
943 Problem
Projects 25%
Second Half of
Implementation
Evaluation
592 Problem
Projects 63%
41% Success
351 Good
Projects 37%
81% Success
853 Problem
Projects 30%
48% Success
3764 Projects
2821 Good
Projects 75%
1968 Good
Projects 70%
87% Success
Approval
First Half of
Implementation
Second Half of
Implementation
592 Problem
Projects 63%
943 Problem
Projects 25%
Evaluation
41% Success
55%
351 Good
Projects 37%
81% Success
853 Problem
Projects 30%
48% Success
3764 Projects
2821 Good
Projects 75%
75%
1968 Good
Projects 70%
87% Success
Overall 71%
Success Rate
Results: Early Warning Indicators
• Effectiveness delays are associated with slightly
better outcomes
• Problem Project Flag raised in first half of life of
project are highly significantly negative
• not a mechanical correlation with outcome
• potential problem flags also significant in AFR
• Restructurings are positively correlated with
outcomes (more so in AFR)
• Again partial correlations are hard to interpret – e.g.
a “difficult” project is more likely to be flagged and is
more likely to turn out unsuccessful
Role of Unobserved (by us) Project
Characteristics
• Many of the project variables respond endogenously to
project characteristics, e.g.
– “difficult” projects require more supervision, are more
likely to be flagged, and also are more likely to be
unsuccessful
– creates downward bias in OLS estimates of effects of
interventions such as supervision
• Can’t rely on standard solutions like randomized
controlled assignment of Bank inputs (infeasible) or
instrumental variables (unjustifiable)
Paper has details on alternative approach to quantify likely
biases – with reasonable assumptions can retrieve
intuitively-plausible positive effects of supervision, flags,
etc. on project outcomes – but magnitude hard to pin
down precisely
Role of Task Team Leaders
• Task team leader (TTL) is important World Bank “input”
into projects
• We have data on the staff ID number of the TTL:
– from final ISR, for 3,925 projects in post-1995 sample
• publicly available in Project Portal
– for each ISR, for 3,187 projects in post-1995 sample
• use to investigate TTL turnover
• Explore two practical questions:
– How important are TTL fixed effects relative to country
fixed effects?
– How important is TTL “quality” relative to other
correlates of project outcomes?
Country Effects vs TTL Effects
• In order to investigate this, need a sample where there is
meaningful variation across countries and TTLs
– e.g. if each TTL worked in only one country, can’t
separately identify country and TTL effects
• Restrict attention to sample of 2407 projects where TTL has
managed (i) more than one project, and (ii) in more than one
country
– covers 136 countries and 710 TTLs
• For projects where we have “time series of TTLs” by ISR within
projects, also identify “Initial” TTL, as distinct from “Final” TTL
at time of final ISR
– look at subset of projects where “Initial” and “Final” TTL
are different to separately identify “Initial” and “Final” TTL
effects
How Much Does TTL “Quality” Matter?
• Proxy for quality of TTL on a given project as average IEG
rating on all other projects with same TTL
– only for projects with TTLs managing two or more
projects
– variant 1: define quality as average IEG rating over
previous projects managed by same TTL
– variant 2: define quality as weighted average (by
number of ISRs) of all other projects the TTL was ever
responsible for (not just at the end of project)
• TTL “turnover” is average number of TTLs per ISR
– median project lasts six years, has 12 ISRs, and 2 TTLs
Results: TTL Quality and Project Outcomes
Dependent Variable Is:
(1)
1-6 Rating
CPIA Rating
0.539***
(10.63)
TTL Quality (Average Outcome on
all Other Projects)
0.180***
(6.29)
TTL Quality (Average Outcome on all
Previous Projects)
All Projects Evaluated 1995-2011
(2)
(3)
(5)
(6)
(7)
1-6 Rating 1-6 Rating 1-6 Rating 1-6 Rating 1-6 Rating
0.542***
(8.88)
0.413***
(6.73)
0.471***
(7.84)
0.458***
(7.39)
0.317***
(3.21)
0.167***
(5.19)
0.131***
(3.99)
0.0969**
(2.42)
0.155***
(5.31)
TTL Quality (ISR-Weighted Average
Outcome on all Other Projects)
0.188***
(4.32)
TTL Turnover (Number of TTLs per ISR)
-1.282***
(-6.08)
Evaluator "Toughness" (Average
Outcome of all Other Projects Rated
By Same Evaluator)
Number of Observations
R-Squared
2407
0.084
1707
0.082
1783
0.049
1895
0.089
-1.672***
(-4.79)
0.271***
(3.50)
0.0660
(0.77)
1672
0.059
1063
0.227
Results: TTL Quality and Project Outcomes
• TTL quality is highly significant with economically large
effects, e.g. consider move from P25 to P75 of:
– TTL Quality: 3.5→4.75, IEG score ↑ by 0.23
– CPIA Score: 3.1→3.6, IEG score ↑ by 0.22
– Alternative quality measures have similarly large
effects
• TTL turnover is highly significant – moving from 2/12 TTLs
per ISR to 3/12 TTLs per ISR implies IEG score ↑ by 0.10
– but need to be cautious about endogeneity of TTL
turnover
– much more to be done here, e.g. to better understand
costs and benefits of 3-5-7 rule
Results: TTL Quality and Project Outcomes
• So far have focused on TTL effects – but could very well
also be evaluator effects
– are there “tough” and “easy” evaluators?
– how do they match to TTLs?
• Two data sources on evaluator identity
– anonymized data from IEG on staff who do desk
reviews of ICRs, for each project since 1995
– manually (!) collected data on TTL for 1150 Project
Performance Audit Reports since 1995
• Some evidence of evaluator effects, but:
– does not undermine significance of TTL effects
– does not survive addition of other controls (likely
reflects sectoral specialization of reviewers?)
Results: TTL Quality and Project Outcomes
• Evidence suggests there is a quantitatively-important
“human factor” in project outcomes
• But much more needs to be done:
– are there common attributes to TTLs who have a
track record of successful projects?
– are there endogeneity problems in the
“assignment” of TTLs to projects?
– do higher levels of management matter?
– are there other dimensions, such as counterpart
quality, that matter as well?
Policy Implications
• Country-level policies and institutions do matter a lot
for project outcomes
– don’t throw out baby with bathwater!
– (one more) piece of support for donor policies
targetting aid to countries with better policy
– but at most this can help us with 20% of variation
in project outcomes that occurs across countries
Policy Implications, Cont’d
• The 80% of variation in project outcomes within
countries challenges us to think hard about how to
improve project success within countries, e.g.
– why are problem projects hard to turn around, or
cancel outright once warning signs emerge?
– is there scope for project- as well as country-level aid
allocation mechanisms to ensure better outcomes?
• e.g. what if WB were to allocate some resources to
“proposals” submitted by TTLs?
– analogous to NSF (or KCP) proposals to obtain
research grants
– criteria for judging proposals could be tailored
to reflect country and TTL characteristics
– how can we better learn about the effectiveness of
Bank inputs into project outcomes?
Pipeline
• Many more interesting questions to be answered
using this kind of project data
– some preliminary evidence that projects managed
by “decentralized” TTLs located in country of
project do better
– assembling TTL-VPU assignment data to see if “35-7”-induced TTL turnover matters for project
outcomes
– working with colleagues at AfDB and AsDB to
assemble similar data for their projects
– and much more....suggestions welcome!

similar documents