Experimental evaluation in education

Report
Experimental evaluation in
education
Professor Carole Torgerson
School of Education, Durham University, United Kingdom
[email protected]
International workshop Social Experiments and Innovation: a new
paradigm for public intervention?
Barcelona, September 26th 2013
Some landmark experimental evaluations
in education in the US and the developing
world
• Cambridge Somerville youth experiment (US)
– Early intervention to reduce ‘juvenile delinquency’
• Tennessee class-size experiment (US)
– Class-size reduction intervention in early years education
• PROGRESA experiment (Mexico)
– Rural anti-poverty intervention
• Balsakhi experiment (India)
– Teaching assistants intervention in literacy and numeracy
Why social experiments?
• Powerful tool for evaluating impacts in
education:
– Simple to understand
– Simple to analyse
– Ethical to use random allocation in the presence
of resource scarcity
– Scientifically the most rigorous evaluative
approach
– Deal with known and unknown confounding
Experimental evaluation in education in
an age of austerity
• Educational interventions that do not work waste
money or worse
• Educational budgets across Europe under strain
due to poor economic conditions
• Imperative that the most effective and costeffective interventions are adopted by policy
makers
• Randomised experiments offer the best evidence
for efficacy, effectiveness and cost-effectiveness
Some examples of expensive educational
interventions that don’t work (UK)
• Financial incentives (Brooks et al, 2008)
• ICT and spelling (Brooks et al, 2006)
• Nurse numeracy intervention (Ainsworth et al,
2011)
Example of educational intervention that
does work: Every Child Counts
• Every Child Counts (ECC) was previous UK
government’s flagship policy to help children (age 7)
at risk in numeracy
• Expensive one-to-one tutoring intervention
(Numbers Count) delivered each day over one school
term (12 weeks)
• Randomised experiment commissioned to establish
effectiveness and cost-effectiveness (Torgerson et al,
2013a; Torgerson et al, 2013b)
• Pre-post test evaluation (undertaken by developer)
demonstrated large effect size (>1 SD) and costeffectiveness (using weak design)
The ECC evaluation design
• Three linked randomised experiments:
– ECC (Numbers Count) vs. ‘business as usual’ (Trial 1):
44 schools
– ECC (Numbers Count) pairs vs. ECC one-to-one (Trial
2): 15 schools
– ECC triplets (Numbers Count) vs. ECC one-to-one (Trial
3): 7 schools
• Process evaluation:
– Random sample of schools
– Implementation and delivery
Design of Trial 1
• 12 children in 44 schools eligible for ‘Numbers Count’ intervention
• Numeracy test (Sandwell test) (pre-test) at beginning of autumn term
(administered by teachers)
• Random allocation of 12 children to term of delivery: autumn, spring or
summer: ‘waiting list’ design
• Intervention group: autumn children
• Control group: spring and summer children
• Numeracy test (Progress in Maths test) after 12 weeks (administered by
independent testers) (post-test)
• Simple analysis: compare the mean numeracy post-test score of
intervention children with mean numeracy score of control children and
conclude whether ‘Numbers Count’ is more effective than
• ‘business as usual’
• Rigorous design: excludes some alternative explanations for results
Design features that minimise alternative
explanations for results
• Large sample size: excludes chance finding
• Randomisation: intervention and control groups are
equivalent at start so design controls for history,
maturation, regression to the mean, selection bias
• Intervention and control conditions are both
numeracy interventions and both last for 30 minutes
each day for 12 weeks: the comparison is a ‘fair’ one
• Independent ‘blinded’ testing: eliminates possibility
of tester bias
Results
Intervention
Group
PIM 6 (0-30)
15.8
N = 144
Control
Group
14.0
N = 440
Results
Results
Intervention
Group
PIM 6 (0-30)
15.8 (4.9)
N = 144
Control
Group
14.0 (4.5)
N = 440
Effect Size
95% Confidence
Interval
0.33
(0.12 to 0.53)
Results
• ECC better than business as usual (0.33 SD)
but expensive
• No evidence that one-to-one was better than
ECC in pairs or triplets
• One-to-one not cost effective – more costeffective to deliver in pairs or triplets
Design limitations: Generalisability
• ECC schools were identified:
by policy-makers/funders of
programme - education
policy ‘roll out’ in England,
i.e., schools in
disadvantaged areas
• Ideally, a random sample of
all secondary schools in
England should have been
approached and asked to
take part
Design limitations: Intervention
• One-to-one teaching with
intervention children being
withdrawn from classroom
• Problem of attribution: was
effect due to Numbers
Count intervention? Or to
one-to-one teaching? Or to
withdrawal from classroom?
• Design could have included
additional one-to-one arm
Design limitations: ‘Contamination’/’spill
over’ effects
• Children withdrawn from
usual classroom teaching –
may have benefited
remaining children
• Teachers using intervention
have applied it to some
control children.
• Instead of randomising
individual children, design
could have randomised by
school (cluster
randomisation, where
school is the cluster) to
avoid these problems.
Design limitations: Long term effects
• Wait list design prevented
long term follow-up; effects
may have ‘washed out’
soon after intervention was
finished.
• Could have used cluster
randomisation;
• Could have recruited
children above threshold
and randomised these to
intervention or long term
follow-up;
• All options (above) rejected
by funder.
Conclusions
• Design and conduct warranted conclusion Numbers Count (as
delivered) more effective than usual classroom teaching BUT
because of design limitations couldn’t answer some really
important questions
• These questions could have been answered if a different
experimental design had been used: cluster randomisation
(randomisation of schools), long-term follow-up (control
group that didn’t receive intervention); one to one control
group (literacy or other numeracy)
3 EEF ‘transitions’ projects
• Background:
– Interventions aimed at children from disadvantaged
backgrounds and those struggling to reach national key stage
writing standards.
• Primary outcome measure:
– combined score on the 2 writing tasks within the Progress in
English test (GL assessment)
• Secondary outcomes:
– scores on the reading, spelling and grammar components of the
Progress in English test (GL assessment)
DISCOVER
• Research question:
– What is the effectiveness of the Discover summer
writing workshop intervention compared with a
‘business as usual’ control group on the writing
abilities of participating children?
• Individually randomised experiment
Discover: Trial Design Diagram
Improving Writing Quality Intervention:
Calderdale
• Research question:
– What is the effectiveness of the Improving Writing
Quality programme compared with ‘business as
usual’ on the writing skills of participating
children?
• Pragmatic cluster randomised design
Calderdale: Trial Design Diagram
Exeter Grammar for Writing
Intervention
• Research questions:
1. What is the effectiveness of the whole class Grammar for writing intervention
compared with a ‘business as usual’ control group on writing skills of participating
children?
2. What is the effectiveness of the whole class Grammar for writing intervention plus
additional small group intervention compared with a ‘business as usual’ control group
on writing skills of participating children?
3. What is the effectiveness of the whole class Grammar for writing intervention plus
additional small group intervention compared with the whole class Grammar for
writing intervention only on writing skills of participating children?
• Partial split plot design
Exeter: Trial Design Diagram
Exeter: Trial Design Diagram
Exeter: Trial Design Diagram
Answers research question 1:
effectiveness of whole class
intervention compared with “business
as usual” on writing skills?
Exeter: Trial Design Diagram
Exeter: Trial Design Diagram
Answers research question 2:
Effectiveness of whole class
intervention plus additional small
group intervention compared with
“business as usual” control group
on writing skills?
Exeter: Trial Design Diagram
Exeter: Trial Design Diagram
Answers research question 3:
effectiveness of whole class
intervention plus additional small
group intervention compared with
whole class intervention only on
writing skills?
Some challenges in promoting and
undertaking experimental evaluations in
education
• Resistance from within research community
• Lack of political will
• Lack of funding opportunities for individual
experiments and for capacity building
• Lack of capacity (experience and expertise) to
undertake rigorous experiments
• Potential for conflict of interest (Developer of
intervention)
• Recruitment and retention
References
•
•
•
•
•
Ainsworth, H., Torgerson, D., Torgerson, C. et al (2011) Computer-based instruction
for improving student nurses’ general numeracy: Is it effective? Two RCTs,
Educational Studies
Brooks, G., Burton, M., Coles, P., Miles, J., Torgerson, C., Torgerson, D. (2008)
Randomised controlled trial of incentives to improve attendance at adult literacy
classes, Oxford Review of Education, 34(4)
Brooks, G., Miles, J.N.V., Torgerson, C.J. and Torgerson, D.J. (2006) Is an
intervention using computer software effective in literacy learning? A randomised
controlled trial, Educational Studies, 32(1)
Torgerson, C.J., Wiggins, A., Torgerson, D.J., Ainsworth, H., Hewitt, C. Every Child
Counts: Testing policy effectiveness using a randomized controlled trial, designed,
conducted and reported to CONSORT standards, Journal of Research in
Mathematics Education, July 2013
Torgerson, C.J., Wiggins, A., Torgerson, D.J., Ainsworth, H., Hewitt, C. The
effectiveness of an intensive individual tutoring programme (Numbers Count)
delivered individually or to small groups of children: A randomised controlled trial,
Effective Education, Apr., 2013

similar documents