### Statistical Applications for Meta-Analysis Robert M. Bernard Centre for the Study of Learning and Performance and CanKnow Concordia University December 11, 2007 Module 2, Unit 13 of.

```Statistical Applications for
Meta-Analysis
Robert M. Bernard
Centre for the Study of Learning and Performance
and CanKnow
Concordia University
December 11, 2007
Module 2, Unit 13 of NCDDR’s course for NIDRR Grantees
Developing Evidence-Based Products Using the Systematic Review Process
Two Main Purposes of a
Meta-Analysis
• Estimate the population central tendency and
variability of effect sizes between an
intervention (treatment) condition and a
control condition.
• Explore unexplained variability through the
analysis of methodological and substantive
coded study features.
12/6/06
2
Effect Size Extraction
Effect sizes extraction involves locating and
converting descriptive or other statistical
information contained in studies into a standard
metric (effect size) by which studies can be
compared and/or combined.
12/6/06
3
What is an Effect size?
• A descriptive metric that characterizes the
standardized difference (in SD units) between
the mean of a treatment group (educational
intervention) and the mean of a control group
• Can also be calculated from correlational
data derived from pre-experimental designs
or from repeated measures designs
12/6/06
4
Characteristics of
Effect Sizes
• Can be positive or negative
• Interpreted as a z-score, in SD units, although
individual effect sizes are not part of a z-score
distribution
• Can be aggregated with other effect sizes and
subjected to statistical procedures such as ANOVA
and multiple regression
• Magnitude interpretation: ≤ 0.20 is a small effect size,
0.50 is a moderate effect size and ≥ 0.80 is a large
effect size (Cohen, 1992)
12/6/06
5
Zero Effect Size
ES = 0.00
Control
Condition
Treatment
Condition
Overlapping
Distributions
12/6/06
6
Moderate Effect Size
ES = 0.40
Control
Condition
12/6/06
Treatment
Condition
7
Large Effect Size
ES = 0.85
Control
Condition
12/6/06
Treatment
Condition
8
ES Calculation:
Descriptive Statistics
Glass 
dCohen 
YExperimental  YControl
SDControl
YExperimental  YControl
SDPooled
SDpooled  ((N E  1)SD 2 E  (N C  1)SD 2 C )) / (NTotal  2)
Note: this equation is the same as adding two SSs and dividing by dfTotal
12/6/06
9
Samples: Hedges’g
• Cohen’s d is inaccurate for small samples (N < 20),
so Hedges’ g was developed (Hedges & Olkin, 1985)
gHedges 
YExperimental  YControl
((N E  1)  SD 2 E  (N C  1)SD 2 C )) / (NTot


3
 1 
4(N E  N C )  9 
 2) 
g = Cohen’s d times a multiplier based on sample size
12/6/06
10
Example of ES Extraction with
Descriptive Statistics
Study reports:
Treatment mean = 42.8
Control Mean = 32.5
Treatment SD = 8.6
Control SD = 7.4
n = 26
n = 31
Procedure: Calculate SDpooled
Calculate d and g
SD pooled  ((26  1)8.6 2 )  (31  1)7.4 2 )) / (57  2)
SD pooled  (1849  1642.8) / 55  3491.8 / 55  63.49  7.97
42.8  32.5 10.3

 1.29
7.97
7.97


3

3

3 

g  d 1 

1.29
1


1.29
1


  1.27





(4(N E  N C ))  9 
4(26  31)  9 
219

d
12/6/06
11
Alternative Methods of ES
Extraction: Exact Statistics
• Study Reports: t (60) = 2.66, p < .05
2t
2(2.66) 5.32
d


 0.687
7.46
df
60
• Study Reports: F (1, 61) = 7.08, p < .05
Convert F to t and apply the above equation:
t  F  2.66;df  60
2t
2(2.66) 2(2.66) 5.32
d



 0.687
7.46
7.46
df
60
12/6/06
12
Alternative Methods of ES
Extraction: Exact p-value
• Study Reports: t (60) is sig. p = 0.013
Look up t-value for p = 0.013
t = 2.68
1
1
d t

N E NC
1
1
d  2.68

 2.68(0.254)  0.681
31 31
12/6/06
13
Calculating Standard Error
The standard error of g is an estimate of the “standard deviation” of
the population, based on the sampling distribution of an infinite number
of samples all with a given sample size. Smaller samples tend to have
larger standard errors and larger samples have smaller standard errors.
Standard Error:
ˆ g 


1 1
g2
3
 
1

ne nc 2(ne  nc ) 
4(ne  nc )  9 
ˆ g 
1
1
0.687 2 
3



1

30 30 2(30  30) 
4(30  30)  9 
ˆ g  0.071  1  0.1298 
ˆ g  0.266)  (0.987 
ˆ g  0.262
12/6/06
14
Test Statistic and
Confidence Interval
Z-test
zg  g ˆ i
(Null test: g = 0):
zg  0.687 0.26
zg  2.62
Conclusion: 2.62 > 1.96 (p < 0.05); Reject H0: g > 0
95th Confidence
Interval
CIUL  g  (1.96  ˆ i )
CIU  0.687  (1.96  0.26)
CIU  1.97
CI L  0.687  (1.96  0.26)
CI L  0.177
Conclusion: Confidence interval does not cross 0
(g falls within the 95th confidence interval).
12/6/06
15
Other Important Statistics
Variance:
ˆ 2g  (ˆ g )2
ˆ 2g  (0.262)2
The variance is the standard
error squared.
ˆ 2g  0.069
Inverse Variance (w): wi  1 ˆ 2
wi  1 0.069
wi  14.54
Weighted g (g*w):
The inverse variance (w) provides
a weight that is proportional to
the sample size. Larger samples
are more heavily weighted than
small samples.
Weightedg  (wi )(gi )  14.54  0.687  9.99
Weighted g is the weight (w) times the value of g. It can be + or –, depending on the sign of g.
12/6/06
16
HedgesÕ
g
Standard
Variance
Error
( ˆ 2g )
( ˆ g )
2.44
2.31
1.38
1.17
0.88
0.81
0.80
0.68
0.63
0.60
0.58
0.32
0.25
0.24
0.24
0.19
0.11
0.09
0.02
0.02
0.02
-0.11
-0.11
-0.18
-0.30
0.330
12/6/06
95th
Lower
Limit
95th Upper
z-Value
Limit
p-Value
Weights
( wi )
Weighted
g
(wi )(gi )
19.94
34.60
11.11
27.70
34.60
69.44
156.25
30.86
3.84
59.17
11.89
82.64
156.25
25.00
44.44
69.44
69.44
156.25
17.36
34.60
14.79
17.36
12.76
20.66
277.78
48.65
79.93
15.33
32.41
30.45
56.25
125.00
20.99
2.42
35.50
6.90
26.45
39.06
6.00
10.67
13.19
7.64
14.06
0.35
0.69
0.30
-1.91
-1.40
-3.72
-83.33
0.22
0.17
0.30
0.19
0.17
0.12
0.08
0.18
0.51
0.13
0.29
0.11
0.08
0.20
0.15
0.12
0.12
0.08
0.24
0.17
0.26
0.24
0.28
0.22
0.06
0.05
0.03
0.09
0.04
0.03
0.01
0.01
0.03
0.26
0.02
0.08
0.01
0.01
0.04
0.02
0.01
0.01
0.01
0.06
0.03
0.07
0.06
0.08
0.05
0.00
2.00
1.98
0.79
0.80
0.55
0.57
0.64
0.33
-0.37
0.35
0.01
0.10
0.09
-0.15
-0.05
-0.05
-0.13
-0.07
-0.45
-0.31
-0.49
-0.58
-0.66
-0.61
-0.42
2.88
2.64
1.97
1.54
1.21
1.05
0.96
1.03
1.63
0.85
1.15
0.54
0.41
0.63
0.53
0.43
0.35
0.25
0.49
0.35
0.53
0.36
0.44
0.25
-0.18
10.89
13.59
4.60
6.16
5.18
6.75
10.00
3.78
1.24
4.62
2.00
2.91
3.13
1.20
1.60
1.58
0.92
1.13
0.08
0.12
0.08
-0.46
-0.39
-0.82
-5.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.22
0.00
0.05
0.00
0.00
0.23
0.11
0.11
0.36
0.26
0.93
0.91
0.94
0.65
0.69
0.41
0.00
0.03
0.00
0.28
0.38
12.62
0.00
1458.21* 481.87*
Average g (g+) is the
sum of the weights
divided by the sum
of the weighted gs.
w

g 
 (w )(g )
i
i
481.87
g 
1458.21
g  0.333
17
i
12/6/06
18
ES Extraction Exercise
Materials:
• EXCEL SE Calculator
• 5 studies from which to
extract effect sizes
Mean and Variability
ES+
Variability
12/6/06
Note: Results from Bernard, Abrami, Lou, et al. (2004) RER
20
Mean Effect Size
k
g+
g 
 (w )g 
i
i
i 1
g 
k
w
i
481.87
 0.330
1458.21
i 1
Var
SE
z
ˆ 2g
 k 1
  2 
 i 1 ˆ 
1
ˆ g  ˆ 2
zg
g

ˆ g

1
k
1
 ˆ 2
i 1

ˆ 2g  
1
k
w
1
 0.0007
1458.21
i
i 1
ˆ g  0.00  0.0265
zg  
0.330
 12.62
0.0265
Conclusion: Mean g = 0.33 and it is significant.
12/6/06
21
Variability (Q -Statistic)
Question: How much variability surrounds g+ and is it significant?
Are the effect sizes heterogeneous or homogeneous?
(g  g)2
Q
ˆ 2g
i 1
k
QTotal
(2.44  0.330)2 ) (2.31  0.330)2
(0.18  0.330)2 (0.30  0.330)2


 ...

 469.54
0.11
0.03
0.22
0.07
Q-value
df (Q)
P-value
469.54
24
0.000
Conclusion: Effect sizes are heterogeneous.
Tested with the 2 distribution.
12/6/06
22
Homogeneity vs. Heterogeneity
of Effect Size
• If homogeneity of effect size is established,
then the studies in the meta-analysis can be
thought of as sharing the same effect size
(i.e., the mean)
• If homogeneity of effect size is violated
(heterogeneity of effect size), then no single
effect size is representative of the collection
of studies (i.e., the “true” mean effect size
remains unknown)
12/6/06
23
Statistics in Comprehensive
Meta-Analysis™
Effect size and 95% confidence interval
Test of null (2-Tail)
Number Studies Point estimate Standard error Variance Lower limitUpper limit Z-value
P-value
25
0.33
0.03
0.00
0.28
0.38
12.62
0.00
Heterogeneity
Q-v alue
469.54
df (Q)
24
P-v alue
0.00
Interpretation: Moderate ES for all outcomes (g+ = 0.33) in favor of the
intervention condition.
Homogeneity of ES is violated. Q-value is significant (i.e., there is too much
variability for g+ to represent a true average in the population).
Comprehensive Meta-Analysis 2.0.027 is a trademark of BioStat®
12/6/06
24
Back to ES Calculator
1. Interpretation of Mean Effect Size
2. Interpretation of Q-Statistic
12/6/06
25
Homogeneity versus Heterogeneity of
Effect Size
Distribution1:
Homogeneous
variation left to be
explained by moderators.
Distribution 2:
Heterogeneous
No variation left
to be explained
by moderators.
g+
12/6/06
26
Examining the Study
Feature “Method of ES Extraction”
Overall
Effect
g+ = +0.33
Exact
Descriptive
Estimated
Statistics
Exact
Statistics
12/6/06
27
Tests of Levels of “Method of
ES Extraction”
Groups
Group
Descriptive
Statistics
Est. Statistics
Total within
Total between
Overall
Effect size and 95% confidence interval
Heterogeneity
N of Studies Point estimate Standard error Lower limit Upper limit Q-value
df (Q)
P-value
15
3
7
25
0..29
0.21
0.63
0.33
0.03
0.06
0.06
0.03
0.22
0.09
0.50
0.28
0.35
0.33
0.75
402.56
0.97
37.00
14
2
6
0.00
0.62
0.00
0.38
442.50
27.04
469.54
22
2
24
0.00
0.00
0.00
Interpretation: Small to Moderate ESs for all categories in favor of the
intervention condition.
Homogeneity of ES is violated. Q-value is significant for all categories (i.e.,
“Method of ES Extraction” does not explain enough variability to reach
homogeneity).
12/6/06
28
Meta-Regression
Seeks to determine if “Method of ES Extraction” predicts effect size.
Q
df
p-valu e
Model
15.50
1
0.00
Residual
454.04
23
0.00
Total
469.54
24
0.00
Conclusion: “Method of Extraction” design is a significant predictor of
ES but ES is still heterogeneous.
12/6/06
29
Sensitivity Analysis
• Tests the robustness of the findings
• Asks the question: Will these results stand up when
potentially distorting or deceptive elements, such as
outliers, are removed?
• Particularly important to examine the robustness of
the effect sizes of study features, as these are usually
based on smaller numbers of outcomes
12/6/06
30
Sensitivity Analysis: Low Standard
Error Samples
12/6/06
31
One Study Removed Analysis
12/6/06
S tu dy
Poin t
SE
Varian ce
1.00
2.00
3.00
4.00
5.00
6.00
7.00
8.00
9.00
10.00
11.00
12.00
13.00
14.00
15.00
16.00
17.00
18.00
20.00
19.00
21.00
22.00
23.00
24.00
25.00
Total
0.30
0.28
0.32
0.31
0.32
0.31
0.27
0.32
0.33
0.32
0.33
0.33
0.34
0.33
0.33
0.34
0.34
0.36
0.33
0.34
0.33
0.34
0.33
0.34
0.48
0.33
0.03
0.03
0.03
0.03
0.03
0.03
0.03
0.03
0.03
0.03
0.03
0.03
0.03
0.03
0.03
0.03
0.03
0.03
0.03
0.03
0.03
0.03
0.03
0.03
0.03
0.03
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
Lower
limi t
0.25
0.23
0.27
0.26
0.27
0.25
0.22
0.27
0.28
0.27
0.28
0.28
0.29
0.28
0.28
0.28
0.29
0.30
0.28
0.29
0.28
0.28
0.28
0.29
0.42
0.28
Upper
limi t
0.35
0.33
0.37
0.37
0.37
0.36
0.33
0.37
0.38
0.37
0.38
0.38
0.39
0.38
0.39
0.39
0.39
0.41
0.39
0.39
0.39
0.39
0.39
0.39
0.54
0.38
z-Valu e
p-Valu e
11.42
10.65
12.26
11.88
11.96
11.42
9.89
12.20
12.57
11.93
12.49
12.28
12.27
12.57
12.53
12.58
12.73
12.96
12.69
12.75
12.68
12.74
12.71
12.81
16.45
12.62
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
32
Sensitivity Analysis of CT Data
0.60
0.50
Values of g+
0.40
0.30
0.20
0.10
0.00
1
2
3
4
5
6
7
8
9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25
Studies 1 to 25
12/6/06
Studies with
High
Weighted g+
g
Study 7
Study 13
Study 18
Study 25
0.80
0.25
0.02
-0.30
g+
g+ with
study
removed
Difference
(w)
(g)(w)
%*
Influence
0.330
0.330
0.330
0.330
0.27
0.34
0.36
0.48
-0.06
+0.04
+0.06
+0.15
156.25
156.25
156.25
277.78
125.00
39.09
14.06
-83.33
25.9
8.1
2.9
17.41
Totals
*% Influence = (g)(w)/481.87 (100)
746.53
54.31
33
Steps in Controlling for
Study Quality
• Step one: Are the effect sizes
homogeneous?
• Step two: Does study quality explain the
heterogeneity?
• Step three: Which qualities of studies
matter?
• Step four: How do we deal with the
differences?
12/6/06
34
Controlling Study Quality Using
Dummy Coding in Meta-Regression
Categories of Study
Dummy 1 Dummy 2 Dummy 3 Dummy 4
Quality
12/6/06
1
0
0
0
0
2
1
0
0
0
3
0
1
0
0
4
0
0
1
0
5
0
0
0
1
35
g+ Before
Categories
QWithin
df
p
1
-0.185
-0.185
2.243
3
0.524
2
-0.218
-0.218
3.302
3
0.347
3
0.683
-0.065
3.252
3
0.354
4
0.565
-0.183
4.953
3
0.175
5
0.390
-0.358
1.985
3
0.576
Total
0.247
-0.202
15.734
15
0.400
12/6/06
36
Selected References
Bernard, R. M., Abrami, P. C., Lou, Y. Borokhovski, E., Wade, A.,
Wozney, L., Wallet, P.A., Fiset, M., & Huang, B. (2004). How
does distance education compare to classroom instruction? A
meta-analysis of the empirical literature. Review of Educational
Research, 74(3), 379-439.
Glass, G. V., McGaw, B., & Smith, M. L. (1981). Meta-analysis in
social research. Beverly Hills, CA: Sage.
Hedges, L. V., & Olkin, I. (1985). Statistical methods for metaanalysis. Orlando, FL: Academic Press.
Hedges, L. V., Shymansky, J. A., & Woodworth, G. (1989). A
practical guide to modern methods of meta-analysis. [ERIC
Document Reproduction Service No. ED 309 952].
12/6/06
37
```