Measures of LD
Jess Paulus, ScD
January 29, 2013
Today’s topics
1.
2.
Multiple comparisons
Measures of Linkage disequilibrium
• D’ and r2
• r2 and power
Multiple testing & significance
thresholds


Concern about multiple testing
Standard thresholds (p<0.05) will lead to a
large number of “significant” results


Vast majority of which are false positives
Various approaches to handling this
statistically
Possible Errors in Statistical Inference
Unobserved Truth
in the Population
Observed
in the
Sample
Reject
H0: SNP
prevents
DM
Fail to
reject H0:
No
assoc.
Ha: SNP prevents DM
H0: No
association
True positive
(1 – β)
False positive
Type I error (α)
False negative
Type II error (β):
True negative
(1- α)
Probability of Errors
α=
Also known as: “Level of significance”
Probability of Type I error – rejecting
null hypothesis when it is in fact true
(false positive), typically 5%
p value =
The probability of obtaining a result as
extreme or more extreme than you
found in your study by chance alone
Type I Error (α) in Genetic and
Molecular Research
A genome-wide association scan of
500,000 SNPs will yield:
25,000 false positives by chance alone
using α = 0.05
5,000 false positives by chance alone
using α = 0.01
500 false positives by chance alone using
α = 0.001
Multiple Comparisons Problem



Multiple comparisons (or "multiple testing") problem
occurs when one considers a set, or family, of statistical
inferences simultaneously
Type I errors are more likely to occur
Several statistical techniques have been developed to
attempt to adjust for multiple comparisons

Bonferroni adjustment
Adjusting alpha

Standard Bonferroni correction





Test each SNP at the α* =α /m1 level
Where m1 = number of markers tested
Assuming m1 = 500,000, a Bonferroni-corrected threshold
of α*= 0.05/500,000 = 1x10–7
Conservative when the tests are correlated
Permutation or simulation procedures may increase
power by accounting for test correlation
Measures of LD
Jess Paulus, ScD
January 29, 2013
Haplotype definition

Haplotype: an ordered sequence of alleles at
a subset of loci along a chromosome

Moving from examining single genetic
markers to sets of markers
Measures of linkage disequilibrium
a
g
a
g
A
G
A
G
A
G
A
G
A
g
a
g
a
g
A
g
A
G
A
G
A
G
A
G
a
g
a
g

Basic data: table of haplotype frequencies
A
a
G
8
0
50%
g
2
6
50%
62.5%
37.5%
D’ and r2 are most common


Both measure correlation between two loci
D prime …


Ranges from 0 [no LD] to 1 [complete LD]
R squared…


also ranges from 0 to 1
is correlation between alleles on the same
chromosome
D

Deviation of the observed frequency of a
haplotype from the expected is a quantity
called the linkage disequilibrium (D)

If two alleles are in LD, it means D ≠ 0


If D=1, there is complete dependency between
loci
Linkage equilibrium means D=0
G
g
Measure
D’
2 = r2
A
n11
n01
n1
a
n10
n00
n0
n1
n0
Formula
n 11 n 00  n 10 n 01
min( n 1n 0  , n  0 n 1 )
n 11 n 00
 n 10 n 01 
2
n 1n  0 n 1 n o 
*
n 11 n 00  n 10 n 01

n 11 n 00
n 11 n  0
Ref.
Lewontin (1964)
Hill and Weir
(1994)
Levin (1953)
Edwards (1963)
n 10 n 01
Q
n 11 n 00  n 10 n 01
n 11 n 00  n 10 n 01
Yule (1900)
a
g
a
g
A
G
A
G
A
G
A
G
A
g
a
g
a
g
A
g
A
G
A
G
A
G
A
G
a
g
a
g
D’ =
n 11 n 00  n 10 n 01
min( n 1n 0  , n  0 n 1 )
A
G
8
g
2
62.5%
a
0
6
37.5%
D’ =(86 – 0x2) / (86) =1
R2 =
50%
50%
n 11 n 00
 n 10 n 01 
2
n 1n  0 n 1 n o 
r2 = (86 – 0x2)2 / (10688) = .6
r2 and power

r2 is directly related to study power

A low r2 corresponds to a large sample size that is
required to detect the LD between the markers

r2*N is the “effective sample size”

If a marker M and causal gene G are in LD, then a
study with N cases and controls which measures M
(but not G) will have the same power to detect an
association as a study with r2*N cases and controls
that directly measured G
r2 and power

Example:
 N = 1000 (500 cases and 500 controls)
 r2 = 0.4
 If you had genotyped the causal gene directly,
would only need a total N=400 (200 cases and
200 controls)
```