Slide - IAOS 2014 Conference

Bridging economic statistics with people:
A role for alternative sources of data?
Zeynep Orhun Girard
Statistician, ESCAP Statistics Division
IAOS, Danang Viet Nam
9 October, 2014
DISCLAIMER: The views presented here are the author’s and do not necessarily reflect the views and position of the United Nations.
“No wind favors he
who has no destined
Michel de Montaigne
“We can analyze the data without hypotheses
about what it might show. We can throw the
numbers into the biggest computing clusters the
world has ever seen and let statistical algorithms
find patterns science cannot. […] Correlation
supersedes causation, and science can advance
even without coherent models, unified theories,
or really any mechanistic explanation at all”.
Chris Anderson
Editor of Wired Magazine
For official statistics to extract value from
alternative sources of data like Big data
1) It has to be guided closely by statistical policy
2) with the goal of filling actual methodological
and data gaps in different domains of statistics
Methodological/policy developments are
guiding economic statistics
Macroeconomic statistical frameworks are
constantly updated, e.g. SNA
- Input-Output analysis
- First econometric model of business cycle
and the General Theory
- Report on measurement of national income
and the construction of social accounts
SNA published
- Allowed for national statistical policies,
recommended IOT and constant prices
- Introduced satellite accounts
- Some non-market production in production
- Concept of employment introduced in the
sub-sectoring of household sector
- Use of PPPs for international comparison
- Balance sheets and SAMs
- Chapter on informal aspects of economy
3 key policy-related initiatives are shaping the
future of economics statistics
• Five recommendations on material
• Follow-up work on disparities in
national accounts, distribution of
Household Income, Consumption
and Wealth (OECD)
G-20 Data
• Recommendations 15-20 on Sectoral
and Other Financial and Economic
• Data revolution for targeted policy
• Measurement of progress on
sustainable development that
complement GDP (SGD17)
We have witnessed a move towards an integrated approach to statistics and an emphasis of
the household perspective and the distributional aspects of economic activity
Big Data: 3 v’s yes but not only…
Exhaustiveness in scope (n=all)
Indexical in identification
Flexible in fields and scalable in size
Big data and economic statistics so far?
Data sources
Online search queries/web scraping
Substantive areas
Housing market, labour market, prices
Correlations and predictive modelling
  +1 = ( −1,
,  −1,
Use of some big data sources for
economic statistics
1. Housing market (Google Trends)
– Bank of England: McLaren and Schanbhogue (2011)
– Wu and Brynjolfsson (2009)
2. Labour/employment market (Google Trends and Word
Bank of England: McLaren and Schanbhogue (2011)
D’Amuri and Marcucci (2009)
Askitas (2009)
Ettredge et al. (2005)—Word Tracker
3. Prices (Scraping and non-traditional enumeration)
– Billion Prices
– Premise (hybrid)
Common points of these studies
• Compare aggregate trends of online search data
against official/administrative statistics
• Emphasize correlation rather than causality
• Find that that online search data can predict
observed trends within the appropriate lead
time (depends on the individuals and area of
economic statistics)
What can big data do for economic statistics?
Beyond correlations and predictive modelling:
1. Enhance quality and granularity of economic
– Increase resolution and distributional information,
e.g. demographics and geographical location
2. Enhance availability of economic statistics?
– Example: Components of a household balance
sheet, e.g. consumer durables
Selecting the Main Source of Data
Define measurement objective
based on policy question, e.g.
distribution of wealth across
different quintiles of households
at provincial level
Identify approach based on
statistical policy
Identify main data source
based on FPOS and QAF
(Relevance, accuracy, timeliness,
punctuality, accessibility, clarity,
and comparability and
consistency over time) + Costefficiency
Existing dataset
Traditional Data Source
administrative records,
Data requirement X
Alternative Data
Design new data
Big data set
Using big data for distributional aspect
Select dataset
• Online search keyword, e.g.
“insurance” and “repair/garage”
for automobiles, yellow pages
data for business address
• Test correlations with any
existing official statistics/other
data source, e.g. household
surveys covering consumer
Select variable of
• Location, sex, age, etc.
• Test distribution of groups by
demographic characteristics
• Population Census data and
demographic distribution at
the national and sub-national
• Household Income and
Expenditure Data for the item
in question, e.g. vehicle
ownership and its distribution
Apply in analysis
• Use distribution of vehicle
ownership obtained through big
data sources on macroeconomic
Using big data for enhancing data availability
Select dataset
Process data
Apply in analysis
• Value of vehicle owned
through purchase and
repair data, e.g.
insurance databases
• Blow up to national (if
possible sub-national)
level figures
• Calculate depreciation
• Differentiate
household enterprises
• In construction of
balance sheets
• Memo item for
national accounts
Challenges: Big data in official statistics
• Shift from planned data collection activities
• Possible mismatch between what big data can
offer and what the economic policy makers
need (comprehensiveness and comparability)
• Privacy of individuals and confidentiality of
• Lack of code of conduct covering all
stakeholders (public and private)
Opportunities: Big data in official statistics
• In the policy context we live in we need to
integrate different data sources
• Alternative sources of data can respond to such
needs (exhaustive, relational, flexible and
• Maintaining TRUST of individuals is key
– “Fifty-four per cent of global consumers indicated
that they would be comfortable with the use of
information about them if they believed that the
uses would not embarrass them, damage their
interests, or otherwise harm them”
(BCG Global Consumer Sentiment Survey 2013)
1. Big data to complement official statistics
a. Conduct research for innovative statistics
b. Provide quality insights through data confrontation
c. Enhance availability of data by closing data gaps.
2. Statistical policy & actual methodological and
data gaps need to guide big data research to
allow for meaningful results that can be used
3. Big data has a potential role to bring in the
distributional and household aspect to
economic statistics
Next steps?
• Multiply the number of proposals embedded
in methodological and data needs
• Conduct studies with official and private
sources of data
Thanks and for comments/questions:
Zeynep Orhun Girard
[email protected]

similar documents