### Analysis of Randomised Controlled Trials (RCTs)

```Session 4: Analysis and reporting
Steve Higgins (Chair)
Paul Connolly
Stephen Gorard
Analysis of Randomised Controlled Trials (RCTs)
Paul Connolly
Centre for Effective Education
Queen’s University Belfast
Conference of EEF Evaluators: Building Evidence in Education
Training Day, 11 July 2013
Main Analysis of Simple RCT
• These slides provide an introductory overview of one approach
to analysing RCTs
• Assume we are dealing with a continuous outcome variable
• Three variables:
• Pre-test score “score1” (centred so that mean = 0)
• Post-test score “score2”
• Group membership “intervention” (coded 0 = control group;
1 = intervention group)
• Basic analysis via linear regression:
predicted score2 = b0*constant + b1* intervention + b2*score1
Main Analysis of Simple RCT
Predicted score2 = b0*constant + b1* intervention + b2*score1
• b0 = adjusted mean post-test score for those in control group
• b0 + b1 = adjusted mean post-test score for those in
intervention group
• Estimate standard deviations for post-test mean scores using
s.d. for predicted score2 for control and intervention group
separately*
• Significance of b1 = significance of difference between post-test
mean scores for intervention and control groups
• Effect size, Cohen’s d = b1 / [s.d. for pred. score2]
• 95% confidence interval for effect size:
= b1 ± 1.96*(standard error of b1)
standard deviation for pred. score2
*Most statistical software packages provide the option of creating a new variable comprising the
predicted scores of the model. This new variable is the one to use to estimate standard deviations for
Exploratory Analysis of Mediating Effects
for RCT
• Take example of gender differences (variable “boy”, coded as:
0 = girls; 1= boys)
• Analysis via extension of basic linear regression model:
predicted score2 = b0*constant + b1* intervention + b2*score1
+ b3*boy + b4*boy*intervention
• Significance of b4 indicates whether there is evidence of an
interaction effect (i.e. in this case that the intervention has
differential effects for boys and girls)
• Same approach when your contextual variable is continuous
rather than binary as here
Exploratory Analysis of Mediating Effects
for RCT
predicted score2 = b0*constant + b1* intervention + b2*score1
+ b3*boy + b4*boy*intervention
• Use the model to estimate adjusted mean post-test scores*:
• b0 = girls in control group
• b0 + b3 = girls in control group
• b0 + b1 = girls in intervention group
• b0 + b3+ b4 = boys in intervention group
• Estimate standard deviations by calculating s.d. for predicted
score2 for each subgroup separately
*When dealing with a continuous contextual variable, it is often still useful to calculate adjusted mean post-test
scores to illustrate any interaction effects found. This can be done by using the model to predict the adjusted
post-test mean scores for those participants in the control and intervention groups who have a score for the
contextual variable concerned that is one standard deviation below the mean and then doing the same for those
who have a score one standard deviation above the mean.
Extending the Analysis
• For trials with binary or ordinal outcome measures, the same
approach can be used but with generalised linear regression
models:
– Binary logistic regression (binary outcomes)
– Ordered logistic regression (ordinal outcomes)
• For cluster randomised trials (with >30 clusters), the same
models can be used but extended to create two level models
• For quasi-experimental designs, either:
– Same models as above but adding in a number of additional
co-variates (all centred) to control for pre-test differences
– Propensity score matching
• For repeated measures designs can also extend the above
using multilevel models with observations (level 1) clustered
within individuals (level 2)
Discussion (2 mins)
Write on post-it notes:
• What are the key issues or questions for evaluators?
• Have you found any solutions?
Analysis
Stephen Gorard
[email protected]/* <![CDATA[ */!function(t,e,r,n,c,a,p){try{t=document.currentScript||function(){for(t=document.getElementsByTagName('script'),e=t.length;e--;)if(t[e].getAttribute('data-cfhash'))return t[e]}();if(t&&(c=t.previousSibling)){p=t.parentNode;if(a=c.getAttribute('data-cfemail')){for(e='',r='0x'+a.substr(0,2)|0,n=2;a.length-n;n+=2)e+='%'+('0'+('0x'+a.substr(n,2)^r).toString(16)).slice(-2);p.replaceChild(document.createTextNode(decodeURIComponent(e)),c)}p.removeChild(t)}}catch(u){}}()/* ]]> */
http://www.evaluationdesign.co.uk/
What is N?
how many cases were assessed for eligibility?
how many of those assessed did not participate, and for what reasons (not meeting
criteria, refused etc.)?
how many then agreed to participate?
how many were allocated to each group (if relevant)?
how many were lost or dropped out after agreeing to participate (and after allocation to
a group, if relevant)?
how many were analysed, and why were any further cases excluded from the analysis?
An example of reporting problems with a sample
In total, 314 individual Year 7 pupils took part in the study. 157 pupils were assigned to treatment and 157 to control. The sample included
students from a disadvantaged background (eligible for free school meals), those with a range of learning disabilities (SEN) and those for whom
English was a second language. By the final analysis six students had dropped out or could not be included in the gain score analysis. One took
the pre-test (repeatedly) but his school were unable to record the score. His post-test score was 78, and he would have been in the control. Five
others took the pre-test but did not sit the post-test. One left the school and could not be traced, initially scored 78 and would have been in
treatment. One left the school and their new school was not able to arrange the post-test, initially scored 64 and would have been control. One
changed schools, one could not get their score saved at pre-test, one refused to cooperate and one was persistently absent at post-test (perhaps
excluded). Although this loss of data, and the reduction of the sample to 308 pupils, is unfortunate, there is no specific reason to believe that
this dropout was biased or favoured one group over the other.
Pupils allocated to groups but with no gain score, and reason for omission
Allocation
Pre-test score
Post-test score
Reason
Treatment group
78
-
Left school, not traced
Treatment group
73
-
Long-term sick during post-test
Control
74
-
Left school, new school would not test
Control
75
-
Withdrawn, personal reasons
Control
-
70
Pre-test not recorded, technical reasons
Control
73
-
Permanently excluded by school
Source: Gorard, S., Siddiqui, N. and See, BH (2013) Process and summative evaluation of the Switch-On
literacy transition programme, Report to the Educational Endowment Foundation
Discussion (2 mins)
Write on post-it notes:
• What are the key issues or questions for evaluators?
• Have you found any solutions?
Calculating effect sizes and
the toolkit meta-analysis –
implications for evaluators
Steve Higgins
[email protected]/* <![CDATA[ */!function(t,e,r,n,c,a,p){try{t=document.currentScript||function(){for(t=document.getElementsByTagName('script'),e=t.length;e--;)if(t[e].getAttribute('data-cfhash'))return t[e]}();if(t&&(c=t.previousSibling)){p=t.parentNode;if(a=c.getAttribute('data-cfemail')){for(e='',r='0x'+a.substr(0,2)|0,n=2;a.length-n;n+=2)e+='%'+('0'+('0x'+a.substr(n,2)^r).toString(16)).slice(-2);p.replaceChild(document.createTextNode(decodeURIComponent(e)),c)}p.removeChild(t)}}catch(u){}}()/* ]]> */
School of Education, Durham University
EEF Evaluators Conference, June 2013
Sutton Trust/EEF Teaching and Learning
Toolkit



Comparative evidence
Aims to identify ‘best buys’ for schools
Based on meta-analysis
http://educationendowmentfoundation.org.uk/toolkit
What is meta-analysis?

A way of combining the results of quantitative
research




To accumulate evidence from smaller studies
To compare results of similar studies - consistency
To investigate patterns of association in the findings of
different studies – explaining variation
‘Surveys’ research studies
Why meta-analysis?



Cumulative – synthesis of evidence
Based on size of effect and confidence intervals
rather than significance testing – patterns in the
data
Identifying and understanding variation helps
develop explanatory models
What is an “effect size”?

Standardised way of looking at difference

Different methods for calculation


Binary (Risk difference, Odds ratio, Risk ratio)
Continuous


Correlational (Pearson’s r)
Standardised mean difference (d, g, Δ)
 Difference between control and intervention group as
proportion of the dispersion of scores
 Intervention group score – control group score / standard
deviation of scores
Examples of Effect Sizes:
ES = 0.2
“Equivalent to the
difference in heights
between 15 and 16 year old
girls”
58%
of
control
group
below
mean of
experimental
group
Probability you could guess which group a person was in = 0.54
Change in the proportion above a given threshold:
from 50% to 58%
or from 75% to 81%
ES = 0.8
“Equivalent to the
difference in heights
between 13 and 18 year old
girls”
79%
of
control
group
below
mean of
experimental
group
Probability you could guess which group a person was in = 0.66
Change in the proportion above a given threshold:
from 50% to 79%
or
from 75% to 93%
The rationale for using effect sizes

Traditional quantitative reviews focus on statistical
significance testing



Highly dependent on sample size
Null finding does not carry the same “weight” as a
significant finding
Meta-analysis focuses on the direction and
magnitude of the effects across studies


From “Is there a difference?” to “How big is the difference?”
and “How consistent is the difference?”
Direction and magnitude represented by “effect size”
Issues and challenges in meta-analysis

Conceptual




Reductionist - the answer is .42
Comparability - apples and oranges
Atheoretical - ‘flat-earth’
Technical



Heterogeneity
Publication bias
Methodological quality
Comparative meta-analysis



Theory testing
Emphasises
practical value
Incorporate
EEF findings in
new Toolkit
meta-analyses
Ability grouping
Slavin 1990 b (secondary low attainers)
-0.06
Lou et al 1996 (on low attainers)
-0.12
Kulik & Kulik 1982 (secondary - all)
0.10
Kulik & Kulik 1984 (elementary - all)
0.07
Meta-cognition and self-regulation strategies
Abrami et al. 2008
0.34
Haller et al. 1988
0.71
Klauer & Phye 2008
0.69
Higgins et al. 2004
0.62
Chiu 1998
0.67
Dignath et al. 2008
0.62
Calculating effect sizes

The difference between the two means,
expressed as a proportion of the standard
deviation ES = (Me – Mc) / SD
Cohen's d
Glass’ Δ
Hedges' g
Reporting effect sizes: RCTs

Post-test standardised mean difference with
confidence intervals


Fixed effect ok for individual randomisation
Not for clusters…



Cluster analysis
MLM
Equivalent measure


Other comparisons
Matched, Regression discontinuity
http://www.cem.org/evidence-based-education/effect-size-calculator



What analyses are you intending to undertake?
How do you plan to calculate effect size(s)?
What statistical techniques:
1.
2.
3.
Are you confident to undertake?
Would be happy to advise other evaluation teams?
Key requirement: be explicit…



Describe analysis decisions (e.g. ITT and missing data)
Report clusters separately
Submit complete data-set in case different
analysis is required for comparability
information
Books and articles
Borenstein, M., Hedges, L.V., Higgins, J.P.T. & Rothstein, H.R. (2009) Introduction to Meta Analysis (Statistics in Practice) Oxford: Wiley
Blackwell.
Chambers, E.A. (2004). An introduction to meta-analysis with articles from the Journal of Educational Research (1992-2002). Journal of
Educational Research, 98, pp 35-44.
Cooper, H.M. (1982) Scientific Guidelines for Conducting Integrative Research Reviews Review Of Educational Research 52; 291.
Cooper, H.M. (2009) Research Synthesis and meta-analysis: a step-by-step approach London: SAGE Publications (4th Edition).
Cronbach, L. J., Ambron, S. R., Dornbusch, S. M., Hess, R.O., Hornik, R. C., Phillips, D. C., Walker, D. F., & Weiner, S. S. (1980). Toward
reform of program evaluation: Aims, methods, and institutional arrangements. San Francisco, Ca.: Jossey-Bass.
Eldridge, S. & Kerry, S. (2012) A Practical Guide to Cluster Randomised Trials in Health Services Research London: Wiley Blackwell
Glass, G.V. (2000). Meta-analysis at 25. Available at: http://glass.ed.asu.edu/gene/papers/meta25.html (accessed 9/9/08)
Lipsey, Mark W., and Wilson, David B. (2001). Practical Meta-Analysis. Applied Social Research Methods Series (Vol. 49). Thousand
Oaks, CA: SAGE Publications.
Torgerson, C. (2003) Systematic Reviews and Meta-Analysis (Continuum Research Methods) London: Continuum Press.
Websites
What is an effect size?, by Rob Coe: http://www.cemcentre.org/evidence-based-education/effect-size-resources
The meta-analysis of research studies: http://echo.edres.org:8080/meta/
The Meta-Analysis Unit, University of Murcia: http://www.um.es/metaanalysis/
The PsychWiki: Meta-analysis: http://www.psychwiki.com/wiki/Meta-analysis
Meta-Analysis in Educational Research: http://www.dur.ac.uk/education/meta-ed/
Discussion (2 mins)
Write on post-it notes:
• What are the key issues or questions for evaluators?
• Have you found any solutions?
Interpreting and Reporting Findings
and Managing Expectations
Paul Connolly
Centre for Effective Education
Queen’s University Belfast
Conference of EEF Evaluators: Building Evidence in Education
Training Day, 11 July 2013
Interpreting Findings
• Findings:
– only relate to the outcomes measured
– represent effects of programme compared to what those in
– usually only relate to sample recruited (and thus are
context- and time-specific)
• Dangers of:
– ‘fishing exercises’ characterised by post-hoc decisions to
consider other outcomes and/or differences in effects for
differing sub-groups
– hypothesising regarding the causes of the effects (or
reasons for the non-effects)
Reporting Findings
• Being clear:
– Option of using adjusted post-test scores
– Conversion of findings into effect sizes more readily
understandable (e.g. ‘improvement index’)
• Being transparent:
– Identify outcomes at the beginning and stick to these;
register the trial
– Report methods fully (CONSORT statement)
• Being tentative:
– Acknowledge limitations
– Move from evidence of “what works” to evidence of “what
works for specific pupils, in a particular context and at a
particular time”
Source: Connolly, P., Miller, S. & Eakin, A. (2010) A Cluster Randomised Controlled Trial Evaluation of the Media
Initiative for Children: Respecting Difference Programme. Belfast: Centre for Effective Education (p. 31).
See: http://www.qub.ac.uk/research-centres/CentreforEffectiveEducation/Publications/
Example: Improvement index
• Take effect size and convert to Cohen’s U3 index (either by
using statistical tables of effect size calculators online)
• The improvement index represents the increase/decrease in
the percentile rank for an average student in the intervention
group (assuming at pre-test they are at the 50th percentile)
• Effect size of 0.30  U3 of 62% i.e. the intervention is likely to
result in an average student in the intervention group being
ranked 12 percentile points higher compared to the average
student in the control group (who would remain at the 50th
percentile).
0.10  4 percentile points
0.20  8 percentile points
0.40  16 percentile points
0.50  19 percentile points
Managing Expectations
• Regular and ongoing communication is the key
• Importance of logic models and agreement of outcomes with
programme developers/providers at the outset
– Careful consideration of the intervention and associated
activities and clear link between these and expected
outcomes
– Ensure outcomes are domain-specific
• Include sufficient time to discuss findings with programme
developers/providers
– Talk through possible interpretations
– Discuss further potential analyses (but be clear that these
are exploratory)
Discussion (2 mins)
Write on post-it notes:
• What are the key issues or questions for evaluators?
• Have you found any solutions?
Group discussion and feedback
Tables will be arranged by theme.
Evaluators should move to the table with a theme which
either they are able to contribute expertise on or which they
are struggling with.
Tables should discuss:
• What are the key issues or questions for evaluators?
• What are the solutions?
• How can the EEF help?
Feedback from tables.
```