“One gene is many hypotheses” Anonymous

Report
Analysis of environmental genomes
using Pathway Tools
Steven Hallam | University of British Columbia
SRI International, 2013
Overview
• Through the looking glass…
• Environmental Pathway/Genome Databases
• MetaPathways Pipeline Development
2
Metabolism
Vertex = chemical [substrate, product]
Edge = enzyme
• Metabolism, or the synthesis and decomposition of chemicals in a
cell can be organized into pathways represented by graphs.
3
Cellular Pathways
Genome Management Information
System, Oak Ridge National Laboratory
• Our genetic and biochemical understanding of metabolism is based
largely on the study of complete pathways within cells.
4
Distributed Pathways
• However, microbial communities form distributed metabolic
pathways directing matter and energy exchange.
5
Community Metabolism
• The goal is to predict and compare distributed pathways to better
understand biogeochemical cycling and community metabolism in
the environment.
6
Predicting Community Metabolism
Plurality Sequencing
Single-Cell Sequencing
Fragment Recruitment, SOM, PCA
Environmental PGDB (ePGDB)
with Taxonomic Binning
Simulated
ePGDB
7
From Genomes to Biomes
Falkowski et al., (2008) Science 320, 1034-1038
Metagenome
Distributed Pathways
Biogeochemical Cycles
• “The regulation of the pools and fluxes in biogeochemical cycles have their origins in the
genetic inventory of individual microbes, and the regulation of these genes within the
organism is determined by the environment. As such, one can look at the microbial food
web as a collection of genomes whose expression and replication is coordinated through
complex feedback loops at the organismal, population, and ecosystem level. “Chisholm
8
Foundational Questions
• What is the taxonomic and functional structure of the
ecosystem?
• How does this structure change in response to environmental
perturbation?
• What are the ecological consequences of this change?
• What are relevant units of selection, conservation or utilization
for ecological genomic resources?
9
Overview
• Through the looking glass…
• Environmental Pathway/Genome Databases
• MetaPathways Pipeline Development
10
Inference of Metabolic Pathways
Organisms
PGDB Navigator
Genomic Map
Genomic
Map
Pathways
Genes/ORF
Genes/ORF
s
s
Reactions
Gene Products
Gene Products
Compunds
Pathologic*
PGDB
Compounds
Gene Products
Reactions
Pathways
* Integrates genome and pathway data to identify putative metabolic networks
Genes/ORF
s
Genomic
Map
11
Pathway/Genome Navigator
Pathway Viewer
Homepage
Evidence Glyph
Metabolite
Enzyme Found
Unique Enzyme
PGDB*
Pathway Information
Gene Information
*http://ecocyc.org/META/new-image?type=PATHWAY&object=GLYCOLYSIS
12
Environmental PGDB
ePGDB
???
Celllar Overview
Metabolic Pathway
Reaction
Genomic Map
Genomic
Map
Open Reading Frame
Pathway
s
Genes/ORF
Genes/ORF
s
s
Reactions
Gene Products
Gene Products
Compounds
Pathologic*
ePGDB
Compounds
Gene Products
Reactions
Pathway
s
* Integrates genome and pathway data to identify putative distributed metabolic networks
Genes/ORF
s
Genomic
Map
13
ePGDB Navigation
ePGDB
Celllar Overview
Metabolic Pathway
Reaction
Open Reading Frame
14
http://engcyc.org/
(a) BioCyc PGDBs
Tier-1
Highly Curated
EcoCyc
Tier-2
Moderately Curated
Tier-3
Automatically Curated
EngCyc
15
Overview
• Through the looking glass…
• Environmental Pathway/Genome Databases
• MetaPathways Pipeline Development
16
MetaPathways
• A modular pipeline for constructing
Pathway/Genome Databases from
environmental sequence information
• MetaPathways currently supports
four “data products” including i)
GenBank submission, ii) LCA, iii)
MLTreeMap, and iv) ePGDBs with
associated feature summary tables
and GFF files
• MetaPathways externalizes computeintensive processes onto a user
defined cluster using Sun Grid Engine
or the Amazon elastic cloud
17
MetaPathways
• ePGDBs
facilitate
pathway-centric
exploration of environmental sequence
information using Pathway Tools and the
MetaCyc web interface
• Provides inference-based approach to
metabolic reconstruction based on
explicit computational rules to predict
presence or absence of distributed
metabolic networks
http://www.github.com/hallamlab/MetaPathways
http://hallam.microbiology.ubc.ca/MetaPathways
• MetaPathways can be used with multimolecular data sets (DNA, RNA or
protein) sourced from cultured isolates,
single-cells and natural or human
engineered ecosystems
18
ePGDB Navigation
ePGDB
Celllar Overview
Metabolic Pathway
Reaction
Open Reading Frame
19
ePGDB Validation
20
EcoCyc Pathways
• The number of E. coli pathways identified using the MetaCyc blast
database decreases with increasing blast score ratio (BSR) cut-off
while the others stay relatively constant. From this an optimal BSR
between 0.4-0.6 can be inferred.
21
MetaSim Pathways
Sim1 Sim2
(a)
Sim2
(b)
Predicted Pathways
Taxa
Vibrio cholerae str. N16961
Synechococcus elongatus PCC 7942
Mycobacterium tuberculosis H37Rv
Mycobacterium tuberculosis CDC1551
Helicobacter pylori 26695
Caulobacter crescentus NA1000
Caulobacter crescentus CB15
Bacillus subtilis 168
Aurantimonas manganoxydans SI85-9A 1
Agrobacterium tumefaciens C58
0.0
0.2
0.4
0.6
0
Copy Number
40
60
80
100
Sequencing (% Unique-Gm)
(d)
0.8
0.6
Sensitivity
0.2
0.4
1.0
0.8
0.6
0.4
0.0
0.2
0.0
Precision
20
1.0
(c)
Sequential
Kegg
MetaCyc+RefSeq
MetaCyc
0 100 200 300 400 500 600 700
Sim1
0
20
40
60
80
100
Sequencing (% Unique-Gm)
0
20
40
60
80
Sequencing (% Unique-Gm)
100
22
Synthetic Ecology
tetrahydropteroyl tri-L-glutamate
b
5-methyltetrahydropteroyltri-L-glutamate
methionine synthase ii: AAB_8041
5-methyltetrahydropteroyltrigulamatehomocysteine methyltransferase: AAB_5400
L-homocysteine
2.1.1.14
adenosine
putative
adenosylhomocysteinase 3: AAB_3597
3.3.1.1
a
H2O
L-methionine
H2O
ATP
a+b
s-adenosylmethionine
synthase:
diphosphate
AAB_7188
phospate
AAB_3549
2.5.1.6
S-adenosyl-L-methionine
2.1.1.-
a demethylated methyl acceptor
a methylated methyl acceptor
S-adenosyl-L-homocysteine
• The pathway (S-adenosyl-L-methionine cycle II) was identified by
Pathway Tools in the simulated metagenome based on the
combined contribution of two genomes (a + b).
23
Infering Trophic Interactions
L-aspartate
ATP
ADP
chorismate
putative aspartate kinase
2.7.2.4
L-aspartyl-4-phosphate
NADPH
H+
phosphate
NADP+
aspartatesemialdehyde
dehydrogenase:
1.2.1.11
L-glutamine
anthranilate synthase
compontent I
4.1.3.27
pyruvate
L-glutamate
H+
anthranilate
putative anthranilate
phosphoribosyltransferase
diphosphate
2.4.2.18
N(5’ phosphoribosyl) anthranilate
2 H2O
H+
dihydrodipicolinate
synthase
4.2.1.52
NAD(P)+
dihydrodipicolinate
reductase
1.3.1.26
tetrahydropipicolinate
succinyl-CoA
H2O
coenzyme A
5.3.1.24
putative 3-phosphoshikimate-1carboxyvinyltransferase:
chorismate synthase
4.2.3.5
5-enopyruvyl-shikimate- 2.5.1.19
shikimate-3chorismate
3-phosphate
phosphate
1-(o-carboxyphenylamino)-1’
deoxyribulose-5’phosphate
shikimate
5-dehydrogenase
1.1.1.25
putative shikimate
kinase
2.7.1.71
shikimate
Chorismate biosynthesis I
H+
L-2,3-dihydrodipicolinate
H+
NAD(P)H
7-phosphate
5-phospho-a-D-ribose
1-diphosphate
L-aspartate-semialdehyde
pyruvate
2-dehydro-3putative 3deoxyphosphoheptonate
3-dehydroquniate dehydratase
dehydroquinate
aldolase
type III
synthase
4.2.1.10
2.5.1.54
3-deoxy-D-aramino4.2.3.4
D-erythrose-4-phosphate
3-dehydroquinate
3-dehydroshikimate
heptulosonate-
putative
tetrahydrodipicolinate
succinylase:
2.3.1.117
N-succinyl-2-amino-6-ketopimelate
4.1.1.48
H2O
CO2
carbamoyl-phosphate
synthase large/small chain
6.3.5.5
bicarbonate
carbamoyl-phosphate
(1S,2R)-1-C-(indol-3yl)glycerol 3-phosphate
tryptophane
synthase
subunit alpha
4.1.2.8
D-glyceraldehyde3-phosphate
indole
tryptophane synthase
subunit beta
4.2.1.-
2-oxoglutarate
1.4.1.3
L-glutamate
L-serine
H2O
L-tryptophan
Arginine biosynthesis IV & Uridine-5’phosphate biosynthesis
Moranella
Lysine Biosynthesis I
ornithine
carbamoyltransferase
subunit I
2.1.3.3
2.6.1.13
L-glutamate
L-ornithine
L-citrulline
g-semialdehyde
argininosuccinate
argininosuccinate
synthase
6.3.4.5
lyase
4.3.2.1
L-arginino-succinate
L-arginine
Tremblaya
Both
Neither
Tryptophan biosynthesis
• An ePGDB constructed for the Mealybug symbionts Tremblaya
princeps and Moranella endobia predicted interpathway
complementarity in essential amino acid biosynthetic pathways.
McCutcheon, J.P. and von Dohlen, C.D. “An interdependent metabolic patchwork in the nested symbiosis of mealybugs.” Current Biology, 2011, DOI: 10.1016/j.cub.2011.06.051
24
Hawaii Ocean Time Series (HOT)
DeLong et al. Community Genomics Among Stratified Assemblages in the Ocean’s Interior. (2006) Science 311
T. Danhorn, C. R. Young, E. F. Delong, Comparison of large-insert, small-insert and pyrosequencing libraries for metagenomic analysis, ISME J (2012), doi:10.1038/ismej.2012.35.
c1988-2012
25
Environmental Sequence Information
HOT Sample
Depth (m)
Description
Information
Sequencing
Platform
Number of
Sequences
Average
Sequence
Length
Protein
Coding
Sequences
Annotated
MetaCyc MetaCyc
Coding
Reactions Pathways
Sequences
25
upper euphotic
DNA
Roche 454
623559
257
405613
214149
4138
864
75
upper euphotic
DNA
Roche 454
673674
244
430689
222572
4052
854
110
chlorophyll max
DNA
Roche 454
473166
270
336035
165775
4133
860
500
mesopelagic
DNA
Roche 454
995747
276
714743
361193
4464
949
25
upper euphotic
RNA
Roche 454
561821
248
234404
85781
3433
723
75
upper euphotic
RNA
Roche 454
557718
239
203359
66855
3208
669
110
chlorophyll max
RNA
Roche 454
398436
228
135107
36912
2549
532
500
mesopelagic
RNA
Roche 454
479661
266
207465
71400
3034
641
• ePGDBs were generated for environmental sequence information
(DNA and RNA) sourced from the HOT water column.
26
Core Pathways
HOT 25m (RNA/DNA)
HOT 75m (RNA/DNA)
HOT 110m (RNA/DNA)
HOT 500m (RNA/DNA)
MetaCyc Pathways
DNA 110m
DNA 500m
RNA 25m
RNA 75m
RNA 110m
Normalized ORF Counts
8000
10000
6000
4000
2000
0
-2000
10000
8000
6000
4000
0
2000
-2000
8000
10000
6000
4000
0
2000
-2000
8000
10000
RNA 500m
6000
Top 50
DNA 75m
4000
Biosynthesis
DNA 25m
2000
Degradation
0
Energy
Metabolism
ammonium transport
aerobic respiration (cytochrome c)
TCA cycle VI (obligate autotrophs)
TCA cycle V (2-oxoglutarate:ferredoxin oxidoreductase)
TCA cycle IV (2-oxoglutarate decarboxylase)
mixed acid fermentation
heterolactic fermentation
NADH to cytochrome bd oxidase electron transfer
NADH to cytochrome bo oxidase electron transfer
respiration (anaerobic)
glycolysis I
TCA cycle I (prokaryotic)
glycolysis III (glucokinase)
Rubisco shunt
glycolysis IV (plant cytosol)
TCA cycle II (eukaryotic)
pyruvate fermentation to butanol I
TCA cycle III (helicobacter)
pyruvate fermentation to butanoate
pentose phosphate pathway (non-oxidative branch)
methylaspartate cycle
succinate fermentation to butyrate
formate oxidation to CO2
photosynthesis light reactions
reductive TCA cycle II
3-hydroxypropionate/4-hydroxybutyrate cycle
Calvin-Benson-Bassham cycle
fatty acid β-oxidation I
incomplete reductive TCA cycle
nitrate reduction VI (assimilatory)
formaldehyde assimilation I (serine pathway)
glycine cleavage complex
fatty acid beta-oxidation II (core pathway)
reductive TCA cycle I
purine nucleotides degradation IV (anaerobic)
formaldehyde assimilation II (RuMP Cycle)
glycine betaine degradation
glutaryl-CoA degradation
isoleucine degradation I
gallate degradation III (anaerobic)
creatinine degradation II
purine nucleotides degradation III (anaerobic)
phenylacetate degradation I (aerobic)
ammonia assimilation cycle II
octane oxidation
ammonia assimilation cycle I
lysine fermentation to acetate and butyrate
nitrate reduction II (assimilatory)
4-aminobutyrate degradation V
glutamate degradation V (via hydroxyglutarate)
formaldehyde assimilation III (dihydroxyacetone cycle)
4-hydroxyphenylacetate degradation
nitrate reduction I (denitrification)
alkylnitronates degradation
tRNA charging
adenosine nucleotides de novo biosynthesis
NAD/NADH phosphorylation and dephosphorylation
gluconeogenesis I
glutamine biosynthesis III
arginine biosynthesis II (acetyl cycle)
guanosine nucleotides de novo biosynthesis
pyrimidine deoxyribonucleotides de novo biosynthesis II
uridine-5-phosphate biosynthesis
pyrimidine deoxyribonucleotides de novo biosynthesis I
isoleucine biosynthesis I (from threonine)
citrulline biosynthesis
isoleucine biosynthesis II
valine biosynthesis
arginine biosynthesis III
5-aminoimidazole ribonucleotide biosynthesis I
sucrose biosynthesis
formylTHF biosynthesis I
leucine biosynthesis
lysine biosynthesis I
methylerythritol phosphate pathway
folate transformations II
folate transformations I
UDP-N-acetylmuramoyl-pentapeptide biosynthesis III
lysine biosynthesis VI
mycolate biosynthesis
4-hydroxybenzoate biosynthesis V
tetrapyrrole biosynthesis I
5-aminoimidazole ribonucleotide biosynthesis II
cis-vaccenate biosynthesis
jasmonic acid biosynthesis
isoleucine biosynthesis IV
seleno-amino acid biosynthesis
cysteine biosynthesis I
isoleucine biosynthesis III
-2000
Transport
27
Cellular Overview
• Comparison of DNA (Blue) and RNA +DNA (Red) pathway predictions
28
Pathway Partitioning
• Comparison of genetic potential and gene expression data in photic
and dark ocean waters
29
Diagnostic Pathways
Unique to 25m, 75m, and 110m (DNA/RNA)
DNA 75m
DNA 110m
DNA 500m
RNA 25m
RNA 75m
RNA 110m
Logged Normalized ORF Counts
20
10
0
10
20
RNA 500m
10
Biosynthesis
DNA 25m
0
Degradation
photosynthesis light reactions
hydrogen production VIII
(S)-acetoin biosynthesis
ribitol degradation
sorbitol degradation I
ammonia oxidation I (aerobic)
intra-aerobic nitrite reduction
nitrate reduction IV (dissimilatory)
guanosine nucleotides degradation II
L-rhamnose degradation II
D-mannose degradation
2-methylcitrate cycle II
acetate formation from acetyl-CoA II
citrate degradation
reductive monocarboxylic acid cycle
methane oxidation to methanol I
threonine degradation II
threonine degradation III (to methylglyoxal)
methionine degradation II
flavonoid biosynthesis
salidroside biosynthesis
diploterol and cycloartenol biosynthesis
heme biosynthesis from uroporphyrinogen-III I
adenosylcobalamin biosynthesis from cobyrinate I
adenosylcobalamin biosynthesis from cobyrinate II
lipoate biosynthesis and incorporation I
thiamin diphosphate biosynthesis II (Bacillus)
thiamin diphosphate biosynthesis I (E. coli)
glutathione biosynthesis
phosphopantothenate biosynthesis III
biotin biosynthesis from 7-keto-8-aminopelargonate
thiamin diphosphate biosynthesis IV (eukaryotes)
trans, trans-farnesyl diphosphate biosynthesis
menaquinol-8 biosynthesis
5,6-dimethylbenzimidazole biosynthesis
coenzyme M biosynthesis I
mycothiol biosynthesis
coenzyme B/coenzyme M regeneration
pyridoxal 5-phosphate biosynthesis II
UDP-N-acetyl-D-galactosamine biosynthesis II
glycogen biosynthesis I (from ADP-D-Glucose)
CMP-N-acetylneuraminate biosynthesis I (eukaryotes)
ADP-L-glycero-beta-D-manno-heptose biosynthesis
homocysteine and cysteine interconversion
selenocysteine biosynthesis II (archaea and eukaryotes)
glycine biosynthesis IV
10
Energy
Metabolism
Unique to 500m (DNA/RNA)
30
Cryptic Pathways
• For each depth interval, a small
number of cryptic pathways
were predicted in RNA that
were not predicted in DNA data
sets
• These pathways showed depth
distributions consistent with
niche-partitioning
between
sunlit and dark ocean waters
31
Known Hazards
• Missing ATP citrate lyase indicates false positive for rTCA
32
Things to Keep in Mind…
• Pathologic cannot predict pathways not present in MetaCyc
• Evidence for short pathways is hard to interpret
• False positives due to shared enzymes in multiple pathways or
incorrect annotations create hazards
• Currently no taxonomic assignment or coverage information is
mapped onto identified pathways
• Limited functional validation for pathways in metagenomes
33
“One gene is many hypotheses”Anonymous
34
University of British Columbia
Maya Bhatia
Monica Torres Beltran
Annie Cox
Evan Durno
Diane Fairly
Esther Geis
Alyse Hawley
Aria Hahn
Niels Hansen
Sam Kheirandish
Kishori Konwar
Keith Mewis
Antoine Page
Melanie Scofield
Young Song
Nicole Sukdeo
Jody Wright
Elena Zaikova
SRI
Peter Karp
Tomer Altman
Institute for Ocean Sciences
Joint Genome Institute
Pacific Northwest National Laboratory
Marie Robert
Robin Brown
Susannah Tringe
Tijana Glavina del Rio
Angela Norbeck
Ljiljana Pasa-Tolic
Heather Brewer
35

similar documents