Understanding GWAS Chip Design – Linkage Disequilibrium and HapMap Peter Castaldi January 29, 2013 Objectives • Introduce the concept of linkage disequilibrium (LD) • Describe how the HapMap project provides publically available information on genetic variation and LD structure • Review how LD enables genome-wide screens with only a subset of genome-wide SNP markers • Describe the design of chip-based genotype assays Human Genome • 3 billion base pairs, 23 paired chromosomes • 99.9% sequence similarity between individuals • ~12 million variant sites What are the Different Types of Genetic Variation? • Single base pair change (ACGT ATGT), aka Single Nucleotide Polymorphism • ~12 million across the genome • Insertions/Deletions (TGGTTTCTA TGGT---TA) • Can be of variable size • Trinucelotide repeats (microsatellites) • Highly polymorphic, less common than SNPs • Responsible for certain clinic disorders (Huntington’s, Fragile X, myotonic dystrophy) SNPs in detail • SNPs can have up to four possible alleles (A,C,G,T), most have only two alleles present in human populations • Each person has two SNP alleles (one for each copy of the chromosome) • when both copies are the same, you’re homozygous (i.e. AA, CC, GG, TT). When they’re different (AT), your heterozygous. • Each allele has a frequency in which it appears in a given population • major allele (more common), minor allele (less common) • they sum to 1 (or 100%) SNPs are Used as Genetic Markers for GWAS Chips • Properties of SNPs that make them good markers for GWAS • densely spaced across the genome • usually bi-allelic (only 2 alleles in the population, simplifies statistical tests) • GWAS chips can effectively represent most common variation with just a subset of SNPs • with ~500,000 SNPs, most common variation can be captured • this is because there is significant correlation between neighboring SNPs Linkage Disequilibrium Causes Correlation Between Neighboring SNPs • Mendel’s laws state that genes (alleles) are independently transferred across generations (random assortment – linkage equilibrium). • This is not the case when two genetic loci are physically close to each other. • When two physically close genetic loci are not randomly assorted, this is called linkage disequilibrium. Linkage Equilibrium Arises Because of Meiotic Recombination http://kenpitts.net/hbio/8cell_repro/meiosis_pics.htm Linkage and Recombination Paternal DNA Gametogenesis Maternal DNA X Y x y X y X y Z z z z From Paternal grandfather From Paternal grandmother X Y X y z z Recombination Breaks Up Chromosomal Segments Over Generations • recombination is not uniform across the genome (recombination hotspots). • SNPs within the yellow region are correlated with each other and form haplotypes. • Because of this correlation, one can often use a single SNP from a haplotype to represent all the SNP variation within a haplotype. Haplotype Structure Reflects Evolutionary History • The structure of haplotype blocks varies across racial groups • African populations have short LD blocks, reflecting the longer evolutionary history of those populations ~500,000 SNP Markers Can Reasonably Represent Most of the Common Genetic Variation in European Genomes • GWAS relies upon linkage disequilibrium and the ubiquitous nature of SNP markers to enable genome-wide surveys of the impact of common variation on disease susceptibility Pe’er et al. Nat Gen. 2006 The HapMap Project is a catalog of human variation across populations • The Human Genome project provided the complete human sequence for a small number of individuals • To get an accurate sense of variable sites, data from many individuals is needed • HapMap has three iterations (http://hapmap.ncbi.nlm.nih.gov/) • dense genotype data from multiple populations groups • CEU – individuals of Northern and Western European ancestry from Utah • YRI – Yorubans from Nigeria • JPT – Japanese from Tokyo • CHB – Han Chinese from Beijing Data from the HapMap Project Enabled GWAS Chip Design • Information from HapMap Used in chip design • panel of potential SNPs to use in a genotype chip • population specific LD structure to allow the identification of tag SNPs that effectively tag haplotypes Using Linkage Disequilibrium to find Genes • Linkage disequilibrium (LD) means that sites of genetic variation can serve as “markers” for larger chromosomal segments. • Correlation between markers is quantified with rsquared and D’. GWAS identify novel disease loci, but additional localization is often necessary Genotype Chip Technology http://scienceeducation.nih.gov/newsnapshots/TOC_Chips/Chips_RITN/How_Chips_Wo rk_1/how_chips_work_1.html Kang et al. The American Journal of Human Genetics Volume 74, Issue 3 2004 495 - 510 Summary • Genetic material is transmitted across generations in blocks called haplotypes. • Linkage disequilibrium and haplotype blocks allow for SNP tagging approaches that enable GWAS chips to capture common genetic variation with a subset of genetic markers. • Haplotype structure varies across ancestral groups. • The HapMap project catalogs human genetic variation and LD structure across populations.