Practical - UF Computational Biology

Practical Session: Bayesian
evolutionary analysis by sampling trees
Rebecca R. Gray, Ph.D.
Department of Pathology
University of Florida
– is a cross-platform program for Bayesian MCMC analysis
of molecular sequences
– entirely orientated towards rooted, time-measured
phylogenies inferred using strict or relaxed molecular
clock models
– can be used as a method of reconstructing phylogenies,
but is also a framework for testing evolutionary
hypotheses without conditioning on a single tree
– uses MCMC to average over tree space, so that each
tree is weighted proportional to its posterior probability
• The recommended citation for this program is:
– Drummond AJ, Rambaut A (2007) "BEAST: Bayesian
evolutionary analysis by sampling trees." BMC Evolutionary
Biology 7:214
• To cite the relaxed clock model in BEAST:
– Drummond AJ, Ho SYW, Phillips MJ & Rambaut A (2006) PLoS
Biology 4, e88
• To cite the Bayesian Skyline model in BEAST:
– Drummond AJ, Rambaut A & Shapiro B and Pybus OG (2005)
Mol Biol Evol 22, 1185-1192
• The original MCMC paper was:
– Drummond AJ, Nicholls GK, Rodrigo AG & Solomon W (2002)
Genetics 161, 1307-1320
Basic Pipeline
• 1) setting up xml file (beauti)
• 2) running xml file (beast)
• 3) evaluating the performance of the run
• 4) comparing models, obtaining estimates of
parameters (Tracer)
• 5) summarizing the tree distribution
• 6) viewing MCC tree (Figtree)
Downloading programs
– Download contains beauti, BEAST,
Epidemiology of RVF
• The virus was first identified in 1931 in the Rift Valley
of Kenya
• Mosquito vector, primarily infects livestock
• 1997–1998, a major outbreak occurred in Kenya,
Somalia and the United Republic of Tanzania
• September 2000 cases were confirmed in Saudi
Arabia and Yemen (first reported occurrence of the
disease outside the African continent)
Setting up xml file in beauti
• Requires a nexus file
– Helpful to have dates with the sample name
– Use the finest resolution available
• GUI interface allows basic selection of
• Xml file can be manually edited to test
specific hypotheses/tweak run
Beauti practical
• Import alignment (g_63.nex)
• Tip dates – use tipdates, guess dates (years
since some time in the past)
• Site models – use GTR + G, empirical base
• Test hypothesis of strict vs. relaxed molecular
• Trees – coalescent tree prior – constant size
• 5 x 107 generations
• Open xml file with text editor
• Run in beast
• Check mixing of the MCMC chain
• Open S log files in Tracer
• Open L and G2 log files
• What can we do about the trace??
Proper mixing
• First step – run chain longer
– Open L200 files
• Other steps to try:
– Over parameterization – reduce complexity
– Temporal/phylogenetic signal
– Priors are inappropriate
Model testing
• Bayes factors:
– Compare estimates of the marginal likelihoods
of the models of interest
– 2*(ln marginal likelihood model 1 – ln marginal
likelihood model 2)
– >10, strong support for alternative (more
complex model)
• Strict clock vs. relaxed clock
– Also consider the coefficient of variation
Summarizing tree
• TreeAnnotator
– Burnin 10% (501 samples)
– Keep median heights
– MCC tree
• Visualizing tree: FigTree
– Posterior probabilities for branches
– Median heights for clades of interest
Advanced analyses
• Different coalescent priors
– Parametric models (exponential, logistic)
– Bayesian skyline plots
• Phylogeography
– Lemey et al, 2009, Plos Computational Biology
• Site specific rates of variation
Log10 Ne
Log10 Ne
Change in effective population size over time
Bayesian Genealogy Of G Gene
1916 (1868-1942)
Additional resources
• Tutorials on the beast website, google group
• 16th International BioInformatics Workshop
on Virus Evolution and Molecular
– Johns Hopkins University, Baltimore
– 29 August - 03 September 2010, Bethesda, USA

similar documents