### What software is available for calculating effect sizes?

```Funded through the ESRC’s Researcher
Development Initiative
Session 2.1 – Revision of Day 1
Prof. Herb Marsh
Ms. Alison O’Mara
Dr. Lars-Erik Malmberg
Department of Education,
University of Oxford
2
 What are the 3 primary types of effect sizes?
 What sort of information can be used to calculate
effect sizes?
 What software is available for calculating effect
sizes?
 Standardized mean difference
 Group contrasts
 Treatment groups
 Naturally occurring groups
 Inherently continuous construct
 Odds-ratio
 Group contrasts
 Treatment groups
 Naturally occurring groups
 Inherently dichotomous construct
 Correlation coefficient
 Association between variables
X Males  X Females
ES 
SDpooled
ES 
bc
ES  r
5
6
 Standardised mean difference effect sizes indicate
the amount of improvement of treatment group over
control, or the difference between 2 groups.
 Odds ratio effect sizes indicate the likelihood of
something occurring, e.g., not catching an illness
after inoculation
 Correlation effect sizes indicate the strength of the
relationship between 2 variables
Effect size as proportion in the Treatment
group doing better than the average
Control group person
d = .20
d = .80
d = .50
0.50
0.50
0.50
0.40
0.40
0.40
0.30
0.30
0.30
0.20
0.20
0.20
0.10
0.10
0.10
0.00
0.00
0.00
57% of T above xc
69% of T above xc
79% of T above xc
= Control
= Treatment
Effect sizes can be thought of as the
average percentile standing of the average
treated participant relative to the average
untreated participant.
8
Cohen's (1988) Standard
LARGE
MEDIUM
SMALL
Effect Size
Percentile Standing
Percent of Nonoverlap
2.0
97.7
81.1%
1.9
97.1
79.4%
1.8
96.4
77.4%
1.7
95.5
75.4%
1.6
94.5
73.1%
1.5
93.3
70.7%
1.4
91.9
68.1%
1.3
90
65.3%
1.2
88
62.2%
1.1
86
58.9%
1.0
84
55.4%
0.9
82
51.6%
0.8
79
47.4%
0.7
76
43.0%
0.6
73
38.2%
0.5
69
33.0%
0.4
66
27.4%
0.3
62
21.3%
0.2
0.1
0.0
58
54
50
14.7%
7.7%
0%
10
 What are the key statistical assumptions of the 3
meta-analytic methods?
Includes the entire population of studies to be
considered; do not want to generalise to other
studies not included (including future studies).
All of the variability between effect sizes is due to
sampling error alone. Thus, the effect sizes are only
weighted by the within-study variance.
assumes that the collected studies all represent random
samples from the same population
Effect sizes are independent.
In this and following formulae, we will
use the symbols d and δ to refer to
any measure for the observed and the
true effect size, which is not necessarily
the standardized mean difference.
d j    ej
Where
dj is the observed effect size in study j
δ is the ‘true’ population effect
and ej is the residual due to sampling variance
in study j
Is only a sample of studies from the entire
population of studies to be considered. As a result,
we do want to generalise to other studies not
included in the sample (e.g., future studies).
Variability between effect sizes is due to sampling
error plus variability in the population of effects.
In contrast to fixed effects models, there are 2 sources of
variance
Assumes that the studies are random samples of some
population in which the underlying (infinite-sample) effect
sizes have a distribution rather than having a single
value.
Effect sizes are independent.
d j    u j  ej
Where
dj is the observed effect size in study j
δ is the mean ‘true’ population effect size
uj is the deviation of the true study effect size
from the mean true effect size
and ej is the residual due to sampling
variance in study j
Meta-analytic data is inherently hierarchical (i.e.,
effect sizes nested within studies) and has random
error that must be accounted for
Effect sizes are not necessarily independent
Allows for multiple effect sizes per study
 The model combines fixed and random effects
(often called a mixed effects model)
d j   0  u j  ej
Where
dj is the observed effect size in study j
0 is the mean ‘true’ population effect size
uj is the deviation of the true study effect size
from the mean true effect size
and ej is the residual due to sampling variance in
study j
s
d j   0   s X sj  u j  e j
s 1
 If between-study variance = 0, the multilevel model
simplifies to the fixed effects regression model
s
d j   0   s X sj  e j
s 1
 If no predictors are included the model simplifies to
random effects model
d j    u j  ej
 If the level 2 variance = 0 , the model simplifies to
the fixed effects model
d j    ej
 Many meta-analysts use an adaptive (or
“conditional”) approach
IF between-study variance is found in the
homogeneity test
THEN use random effects model
OTHERWISE use fixed effects model
 Fixed effects models are very common, even
though the assumption of homogeneity is
“implausible” (Noortgate & Onghena, 2003)
 There is a considerable lag in the uptake of new
methods by applied meta-analysts
 Meta-analysts need to stay on top of these
developments by
 Attending courses
21
 What is the first step in the analysis of meta-
analytic data in fixed or random effects models?
 What 2 common statistical techniques have been
adapted for use in fixed and random effects metaanalytic modelling?
 What common statistical technique is multilevel
modelling analogous to?
mean effect size and the homogeneity of the effect
sizes (MeanES.sps macro)
If there is significant homogeneity, then:
 1) should probably conduct random effects analyses
 2) model moderators of the effect sizes (determine the
source/s of variance)
ES i
The homogeneity (Q) test asks whether the different effect sizes
are likely to have all come from the same population (an
assumption of the fixed effects model). Are the differences
among the effect sizes no bigger than might be expected by
chance?

Q   wi ES i  ES

2
ES i = effect size for each study (i = 1 to k)
ES = mean effect size
wi
= a weight for each study based on the sample size
However, this (chi-square) test is heavily dependent on sample size. It is
almost always significant unless the numbers (studies and people in
each study) are VERY small. This means that the fixed effect model will
almost always be rejected in favour of a random effects model.
 The Q-test is easy to conduct using the
MeanES.sps macro from David Wilson’s website
 MeanES ES=d /W=weight.
Significant heterogeneity in the
effect sizes therefore random
effects more appropriate and/or
moderators need to be modelled
26
The analogue to the ANOVA homogeneity analysis
is appropriate for categorical variables
 Looks for systematic differences between groups of
responses within a variable
 Easy to implement using MetaF.sps macro
 MetaF ES = d /W = Weight /GROUP = TXTYPE /MODEL =
FE.
Multiple regression homogeneity analysis is more
appropriate for continuous variables and/or when
there are multiple variables to be analysed
 Tests the ability of groups within each variable to predict
the effect size
 Can include categorical variables in multiple regression
as dummy variables
 Easy to implement using MetaReg.sps macro
 MetaReg ES = d /W = Weight /IVS = IV1 IV2 /MODEL = FE.
 If the homogeneity test is rejected (it almost always
will be), it suggests that there are larger differences
than can be explained by chance variation (at the
individual participant level). There is more than one
“population” in the set of different studies.
 The random effects model determines how much of
this between-study variation can be explained by
study characteristics that we have coded.
 The total variance associated with the effect sizes
has two components, one associated with
differences within each study (participant level
variation) and one between study variance:
vTi  v  vi
The weighting for each effect size consists of the
within-study variance (vi) and between-study
variance (vθ)
The new weighting for the random effects model
(wiRE) is given by the formula:
wiRE
1

vi  v
30
Thus, larger studies receive proportionally less
weight in RE model than in FE model.
This is because a constant is added to the
denominator, so the relative effect of sample
size will be smaller in RE model
31
 Like the FE model, RE uses ANOVA and multiple
regression to model potential
moderators/predictors of the effect sizes, if the Qtest reveals significant heterogeneity
 Easy to implement using MetaF.sps macro (ANOVA)
or MetaReg.sps (multiple regression).
 MetaF ES = d /W = Weight /GROUP = TXTYPE /MODEL =
ML.
 MetaReg ES = d /W = Weight /IVS = IV1 IV2 /MODEL = ML.
Significant heterogeneity in
the effect sizes therefore
need to model moderators
v 
Q  ( k  1)
w 
i
 wi 2
 wi
33
Similar to multiple regression, but corrects the
standard errors for the nesting of the data
which incorporates both the outcome-level and the
study-level components
This tells us the overall mean effect size
Is similar to a random effects model
Then expand the model to include predictor
variables, to explain systematic variance between
the study effect sizes
34
d j   0  u j  ej
 (MLwiN screenshot)
s
d j   0   s X sj  u j  e j
s 1
 Using the same simulated data set with n = 15
 Multilevel models:
 build on the fixed and random effects models
 account for between-study variance (like random effects)
 Are similar to multiple regression, but correct the
standard errors for the nesting of the data. Improved
modelling of the nesting of levels within studies
increases the accuracy of the estimation of standard
errors on parameter estimates and the assessment of the
significance of explanatory variables (Bateman and
Jones, 2003).
 Multilevel modelling is more precise when there is
greater between-study heterogeneity
 Also allows flexibility in modelling the data when
one has multiple moderator variables (Raudenbush
& Bryk, 2002)
 Cohen, J. (1988). Statistical power analysis for
the behavioral sciences (2nd ed.). Hillsdale, NJ:
Lawrence Earlbaum Associates.
 Lipsey, M. W., & Wilson, D. B. (2001). Practical
meta-analysis. Thousand Oaks, CA: Sage
Publications.
 Van den Noortgate, W., & Onghena, P. (2003).
Multilevel meta-analysis: A comparison with