Power in context Effect sizes, PISA rankings and EEF star

The use of administrative data in Randomised
Controlled Trials (RCT’s)
John Jerrim
Institute of Education, University of London
 What is an RCT?
 What are the advantages of RCT’s?
 What are their limitations?
 How can administrative data help overcome these
 Implications for GSS…..
My experience is in conducting RCT’s in education ....
….this is the context I am talking about today
BUT – has implications for RCT’s in other areas
What is an RCT?
• Recruit a group of willing participants…..
• X% (usually 50%) assigned to TREATMENT (T)
• X% assigned to CONTROL (C)
• In absence of intervention:
E(T) = E(C)
• Hence, if after intervention, we find……
µ(T) > µ(C)
…… then this is due to the treatment
Advantages (well known…..)
When conducted well…..
• Rules out influence of confounders….…..hence gives causal
effect of T
• Highly policy relevant
• Simplicity! Means + t-test. Easy to communicate
• Standardised reporting / conduct protocols
- Trial registration
Often described as the GOLD STANDARD
In reality, RCT’s also have important limitations……
… though people talk about these a lot less!
A lack of power?
• In education: mostly cluster RCT’s
• Rather than randomise individuals….. Randomise whole schools
• Issue = ICC (ρ). Low power……
Secondary schools (clusters) = 100
200 children per school
ρ = 0.20
20,000 pupils in trial
Minimum detectable effect = 0.25 standard deviations
95% CI = 0 to 0.50 standard deviations
• Imagine it costs £5 to test each child in this trial……
• …you have spent £100,000 just on a post-test!
• Got to deliver intervention in 50 schools (expensive…..)
• Many EEF secondary school RCT’s > £500,000 ……..
• …..average detectable effect across trials = 0.25
• Big ££ for quite wide confidence intervals……
Schools (and pupils within schools) drop out of the trial…..
….particularly when assigned to the control group!
- Breaks randomisation. Loses key advantage of the RCT
- Lose power
Example (my trial)
- 50 schools. 25 Treatment and 25 control
- Treatment follow-up = 23 / 25 schools
- Control follow-up = 9 / 25 schools
Worst of all worlds:
- Bias (selection effects)
- Low power
- High cost
Short-term follow-up only
Test / follow-up often immediately at the end of the trial ….
...often when intervention most effective
BUT we are really interested in long-run, lasting effects
I.e. Much point ↑ age 11 test scores if kids don’t do any better at age 16??
Ideally want short, medium and long-term follow-up…..
….but this again ↑ $$$
External validity
• Most RCT’s recruit participants via convenience sampling…..
….not from a well defined population
• How “weird” is our sample of trial participants?
Have mainly rich pupils?
Have only high-performing schools?
• How far can we generalise results?
- Will we still get an effect when we scale up / roll-out?
How can administrative data help?
What data is available?
Lucky in education. Have the National Pupil Database (NPD).
- School census. Children’s school 3 times per year.
- Assessments at ages 5, 7, 11, 14, 16, 18.
- Demographics (FSM, gender, EAL, ethnicity etc)
Strengths of NPD
- Known for whole state school population
- Low measurement error
- Low missing data
- Track children over time
NPD to increase power
One way to ↑ power is to control for stuff that is linked to the outcome….
…use NPD for this purpose
Maths mastery
Year 7 kids
New way of teaching them maths
Test end of year 7
CONTROL for KS2 MATH scores from NPD
Detectable effect = 0.36 without control (CI = 0 to 0.72)
= 0.22 with NPD controls (CI = 0 to 0.44)
NPD to reduce cost…..
In previous example, could have conducted a pre-test rather than use NPD.
Maths Mastery in 50 schools of 200 children = 10,000 kids
£5 per test. Hence pre-test would have cost a minimum of £50,000
NPD data is there, ready to use.
- Doing a separate pre-test here would have had almost no benefit
NPD to reduce attrition
Schools would have had to have taken time out of maths lessons to conduct
this pre-test…..
…there would be significant administrative burden on them to conduct the
This burden is a major reason for control schools dropping out
Administrative data has….
(i) massively reduced the burden on schools
(ii) Improved validity of the trial
NPD to eliminate attrition
Clever design with NPD data means we can (almost) eliminate drop-out
EXAMPLE: Chess in Schools
- Year 5 children learn how to play chess during one school year
- 50 treatment schools receive chess
- 50 control schools = ‘business as usual’
- Use age 7 (Key Stage 1) as the pre-test scores
- Use age 11 (Key Stage 2) as the post-test scores
Almost no burden on schools (no testing to be done)
Key stage 2 results for all children
Have test scores even if they move schools……
…..should have very little attrition
NPD for long-run follow-up
EXAMPLE: Chess in Schools
Trial conducted in Year 5 (age 9/10). First follow at end Year 6 (age 10/11).
Treatment and control children then move onto secondary school.
Will be able to track these children via their unique pupil number. Hence longrun control:
Do treatment children do better in math GCSE? (Age 16)
Are they more likely to study maths post-16?
Are they more likely to enter a high-status university?
Administrative data means we can answer these questions at little extra cost.
Can answer the question – is there a lasting impact of the treatment?
NPD for external validity / generalisability
Most RCT’s based upon non-random samples of willing participants.
Big issue. But often glossed over!
Without random samples, how do we know if study results generalise to a
wider (target) population?
Admin data – give us some handle on this……..
As we have data for (almost) every child/person in the country…….
…….We can examine how similar trial participants are to target population in
terms of observable characteristics
Implications for GSS
Implications for GSS
Data access
Everything in an RCT should be pre-specified in design
To use admin data in RCT – need to be 100% sure it will be
Speed of data delivery
Design phase = never as long as we ideally want….
Some of these things need quick access to the data
E.g. Stratification. Get ‘better’ randomisation
Implications for GSS
Documentation and ease of use
Admin data can be hard to understand.
E.g. School URN’s changing over time in NPD
Need good documentation to ensure proper use
Training needed…..
Opening and linking data across departments
In education, can track test scores using NPD
But what about other outcomes?
E.g. Health outcomes (relevant for some trials?)
E.g. Labour market outcomes
RCT’s are a very powerful research design…..
…BUT we have to remember their limitations
Administrative data have the potential to help us overcome many
of the limitations often associated with RCT’s
Together, give us a strong research design coupled with large
scale, high quality data

similar documents