Study Designs in GWAS

Report
Study Designs in GWAS
Jess Paulus, ScD
January 30, 2013
Today’s topics
Case-control studies
Population based
Hospital based
Nested studies
Selection bias
Introduction to population stratification
Genetic Association Study Design

Case-Control: Dichotomous endpoints


Continuous or Quantitative traits


Diabetes: yes versus no
HgA1C
Family Studies
Association Study
Sample Size
High
Family Study
Low
Low
High
Heritability
Genetic complexity
High
Low
Hierarchy of Study Designs
Systematic Reviews & Meta Analysis
MR.
HAPPY
Randomized Controlled Trials
Cohort studies
Case-control studies
Cross-sectional studies
Ecologic studies
Case reports
MR.
WORRY
Cohort Study: Selection into study on
basis of exposure status
EXPOSURE
OUTCOME
?
?
Basis on which groups are selected at
beginning of study
PRESENT
ABSENT
Cohort studies in genetic
epidemiology

Allows study of multiple disease endpoints –
extends efficiency of effort to genotype

Selection bias is generally limited
Cohort study limitations for
genetic epidemiology



Loss-to-follow-up bias
Need for repeated questionnaire
assessments for most up to date covariate
information
Very costly and logistically challenging to
genotype entire cohort and survey for
disease endpoints

Due to this reason, genetic epidemiologic studies
of full cohorts are rare
Case-Control: Selection based on
disease status
Control
Exposure?
Case
Basis on which groups are selected at
beginning of study
Case-control designs for genetic
exposures



Appropriate for rare diseases, like cancer
Can be retrospective or prospective (nested
case-control design)
Efficient sampling of an underlying cohort
Control selection



The biggest threat to most case-control studies
Controls must be drawn from the source
population that gave rise to the cases
The ideal controls should:



Represent the exposure distribution in the source
population that gave rise to the cases
Be those who, had they developed the case disease, would
have been included in your study as a case
Failure to select appropriate controls generates
selection bias

Selection of participants based on joint probability of
exposure and outcome
Population case-control study

Cases arise from a given population, and controls
are randomly sampled from that population
(assuming population is enumerated)

Example: cases from CT state tumor registry,
controls drawn from state census tract listings

Reduces potential for selection bias since source of
controls is well-defined
Limitations of the population-based
case-control study for genetic
epidemiology

Lower participation rates than hospital-based
studies, especially given need for biological samples

Implementation of specimen collection and
processing protocols can be challenging outside a
clinical setting

If interest in following participants for survival
outcomes, tracing can be difficult
Hospital-based case-control study

Appropriate for genetic epidemiology studies:





Hospital setting facilitates subject enrollment and biological
specimen collection and analysis
Recruitment by medical staff can aid enrollment
Smaller geographic area to cover than a population-based
study – reduce processing/shipping time
Aids in collection of specimens in a timely fashion after
disease diagnosis, limiting possibility for reverse causation
When cases are hospital-recruited, source population is
the catchment population of the clinic

The collection of all the people who would have been
notified as a case, had they developed disease
Hospital-based case-control study
limitations



Retrospective nature opens door to:
 Recall bias
 Reverse causation
 Selection bias
Selection bias in particular is a risk because it is difficult to
identify the source population that gave rise to the cases
 Ideal control: Who would have presented as a case to
Hospital X had they in fact become ill?
 Attempt to identify catchment population can be
challenging
Sometimes, a control disease (sick controls) is chosen to limit
potential for selection bias and differential recall of past
exposure
 Control illness must not be associated with the gene of
interest
Nested case-control study





A type of population-based control sampling
Any case-control can be conceived as resting within a
cohort of exposed and unexposed
When the cohort is very well defined this is called a
nested case-control study
Sampling from within the cohort (rather than doing full
cohort analysis) is usually motivated by efficiency
concerns
Important applications for genetic epidemiology where it
would be too costly to genotype the full cohort
Nested case-control study design
advantages


Limited potential for selection bias
because full cohort is enumerated and can
randomly sample controls from roster
Often prospective – limits potential for
gene/biomarker to be affected by disease
process
Cohort sources of nested casecontrol studies






EPIC cohort: http://epic.iarc.fr/
Nurses Health Study:
http://www.channing.harvard.edu/nhs/
NCI Breast and Prostate Cancer Cohort Consortium
(BPC3): http://epi.grants.cancer.gov/BPC3/
Multiethnic Cohort (MEC) study:
http://www.uscnorris.com/mecgenetics/
Alpha-Tocopherol, Beta-Carotene Cancer Prevention
cohort: http://atbcstudy.cancer.gov/study_details.html
Framingham Heart Study: www.framinghamheartstudy.org
Analysis of case-control GWA
studies

Univariate analysis: Pearson χ2 or Fisher
exact test, Armitage trend test

Multivariate analysis: Logistic regression
(if unmatched) or conditional logistic
regression (if matched)

similar documents