RNA-Seq analysis of Mycobacterium smegmatis and its phage

RNA-Seq analysis of Mycobacterium
smegmatis and its phage pathogen during
Nicholas Edgington, Ph.D.
Southern Connecticut State University
Dept. of Biology, New Haven CT, USA
RNA-Seq analysis of Mycobacterium smegmatis
and its phage pathogen during infection
Phages are viruses that attack bacteria. They are everywhere
and represent an amazing amount of biomass on the planet
earth. In fact, it is estimated that there are 1030 viruses in the
world’s oceans alone.
Mycobacteriophages specifically attack mycobacteria, which
includes the important human pathogens that cause leprosy
(Mycobacterium leprae) and tuberculosis (Mycobacterium
tuberculosis), as well as the harmless Mycobacterium
smegmatis (M. smeg).
According to the WHO:
tuberculosis is second largest killer (after HIV) by infection of a single
infectious agent, and
one third of the entire Earth’s human population is infected with latent
To date, over 500 mycobacteriophage genomes have been
sequenced mostly through the HHMI Science Education
Alliance’s Phage Hunters Advancing Genomics & Evolutionary
Science program (HHMI SEA-PHAGES) in conjunction with Dr.
Graham Hatfull’s University of Pittsburgh laboratory (phagesdb.org)
(Pope et al., 2011).
About 90% of highly conserved mycobacteriophage genes
“phamilies” have no known function!
Let’s use ‘Dual RNA-Seq’ to validate mycobacteriophage gene
annotations, and determine the temporal pattern of gene
expression during infection.
Module Research Goals
Use ‘Dual RNA-Seq’ to validate
mycobacteriophage gene annotations.
Determine the temporal pattern of
gene expression of the
mycobacteriophage during infection.
Croucher, N.J., and Thomson, N.R. (2010).
Determine the temporal pattern of
gene expression of the host M.
Identify the ‘repressor’ gene of
temperate phage by analyzing a
Student Learning Goals
Be able to explain host-pathogen interactions &
mechanisms in a ‘simple’ bacteria-phage system.
Understand the advantages of using NGS
technologies (including RNA-Seq) to elucidate
gene expression patterns.
Be able to perform an analytic pipeline in a Galaxy
environment in order to discover gene expression
patterns in a ‘dual RNA-Seq’ experiment.
Be able to perform and understand the statistical
implications of RNA-Seq experiments.
Vision and Change Core Competencies
Ability to apply the process of science:
Perform the analysis of a dual RNA-Seq dataset.
Ability to use quantitative reasoning:
Perform quantitative analysis and apply mathematical reasoning to the analysis of a RNASeq dataset.
Ability to use modeling and simulation:
Be able to explain the complex systems that regulate host-phage interactions
Be able to run simulations of RNA-Seq datasets, and observe the effects of modifying
program parameters
Ability to tap into the interdisciplinary nature of science:
NGS technologies represent an interdisciplinary science that intersects with physics,
computer science, engineering, statistical inference, and information science.
Ability to communicate and collaborate with other disciplines:
Collaborate to identify the gene expression patterns of a phage and its host, and present the
data to their peers
Ability to understand the relationship between science and society:
Understand that bacteriophage profoundly affect global ecosystems, can be used to treat
GCAT-SEEK sequencing requirements
The organism is the Mycobacterium smegmatis mc2 155 +/- phage infection
The Mycobacterium smegmatis mc2 155 genome is a single circular
chromosome of 6.99Mb with 6,742 genes, and
The ABCat phage genome is 76.131Kb with 145 predicted genes.
Samples would be pelleted and resuspended in the RNeasy Mini Kit
(Qiagen). The suggested kit for rRNA depletion is RiboZero for Gram +
bacteria (Epicentre).
Need to get around 200 million reads/sample,
with around 160 million reads coming from the host (M.smeg.), and
depending on time point, between 0.3-3 million reads from the
bacteriophage genome (therefore the phage transcripts would represent
~0.2-2% of the reads).
generate single-end libraries from the TruSeq Illumina kit, without
Computer/program requirements
for data analysis
Internet connection, Mac OS, Linux (Ubuntu is nice)
Web browser (excluding IE)
Computer programs:
Galaxy (and pre-compiled RNA-Seq/NGS tools)
public, local, or Amazon EC2 instance
url: ‘usegalaxy.org’
R stats (if using BioConductor NGS packages)
Python 2.7 (if using ‘bcbio-nextgen’ or
‘biopython’ modules)
Student Assessments
Pre- and post-tests for
understanding of statistical methods, calculations,
comprehension of RNA-Seq methodology (wet-lab
techniques and in silico analysis
ability to explain host-pathogen interactions &
mechanisms in a ‘simple’ bacteria-phage system.
Assessment of student confidence in using NGS
computational tools and in navigating in a Linux
Learn to navigate and use “usegalaxy.org” (ie Galaxy) to create a workflow for
the analysis of dual RNA-Seq data from Mycobacterium smegmatis mc2 155 and
a mycobacteriophage.
Import RNA-Seq datasets that will be received from GCAT-SEEK sequencing
facility in late summer into Galaxy.
Convert Genbank files for Mycobacterium smegmatis mc2 155, and the selected
Myccobacteriophage to a GFF file format using the Rätsch lab’s Galaxy Instance
or use the “bcbio-nextgen” Python module.
Import “GTF” or “GFF3” formatted reference genomes for Mycobacterium
smegmatis mc2 155
Import a fasta file of the genomic sequence of the mycobacteriophage ABCcat.
WEEKS 2-4:
Use the Galaxy RNA-Seq tools to maps reads to the two reference genomes.
bowtie, cufflinks, rsem
Discussion & Lecture Topics
ELSI of NGS technologies
Bacterial Host-Pathogen interactions
Bacteriophage replication mechanisms
Lytic versus Lysogenic lifestyles
The connection between mycobacteriophage
genome architecture and temporal gene
expression patterns
Determining gene phylogeny through sequence
comparisons using bioinformatic tools
Croucher, N.J., and Thomson, N.R. (2010). Studying bacterial transcriptomes using RNA-seq. Curr Opin
Microbiol 13, 619–624.
Dedrick, R.M., Marinelli, L.J., Newton, G.L., Pogliano, K., Pogliano, J., and Hatfull, G.F. (2013). Functional
requirements for bacteriophage growth: gene essentiality and expression in mycobacteriophage Giles. Mol
Giardine, B., Riemer, C., Hardison, R.C., Burhans, R., Elnitski, L., Shah, P., Zhang, Y., Blankenberg, D., Albert, I.,
Taylor, J., et al. (2005). Galaxy: a platform for interactive large-scale genome analysis. Genome Res 15, 1451–
Goecks, J., Nekrutenko, A., Taylor, J., Galaxy Team (2010). Galaxy: a comprehensive approach for supporting
accessible, reproducible, and transparent computational research in the life sciences. Genome Biol 11, R86.
Haas, B.J., Chin, M., Nusbaum, C., Birren, B.W., and Livny, J. (2012). How deep is deep enough for RNA-Seq
profiling of bacterial transcriptomes? BMC Genomics 13, 734.
Haas, B.J., and Zody, M.C. (2010). Advancing RNA-Seq analysis. Nat. Biotechnol. 28, 421–423.
Hatfull, G.F. (2012). The secret lives of mycobacteriophages. Adv. Virus Res. 82, 179–288.
Henry, M., and Debarbieux, L. (2012). Tools from viruses: bacteriophage successes and beyond. Virology 434,
Jacobs-Sera, D., Marinelli, L.J., Bowman, C., Broussard, G.W., Guerrero Bustamante, C., Boyle, M.M., Petrova,
Z.O., Dedrick, R.M., Pope, W.H., Science Education Alliance Phage Hunters Advancing Genomics And
Evolutionary Science Sea-Phages Program, et al. (2012). On the nature of mycobacteriophage diversity and host
preference. Virology.
Pope, W.H., Jacobs-Sera, D., Russell, D.A., Peebles, C.L., Al-Atrache, Z., Alcoser, T.A., Alexander, L.M., Alfano,
M.B., Alford, S.T., Amy, N.E., et al. (2011). Expanding the diversity of mycobacteriophages: insights into genome
architecture and evolution. PLoS ONE 6, e16329.
Soneson, C., and Delorenzi, M. (2013). A comparison of methods for differential expression analysis of RNA-seq
data. BMC Bioinformatics 14, 91
Westermann, A.J., Gorski, S.A., and Vogel, J. (2012). Dual RNA-seq of pathogen and host. Nat. Rev. Microbiol.
10, 618–630.

similar documents