RNA-Seq analysis of Mycobacterium smegmatis and its phage pathogen during infection Nicholas Edgington, Ph.D. Southern Connecticut State University Dept. of Biology, New Haven CT, USA RNA-Seq analysis of Mycobacterium smegmatis and its phage pathogen during infection Phages are viruses that attack bacteria. They are everywhere and represent an amazing amount of biomass on the planet earth. In fact, it is estimated that there are 1030 viruses in the world’s oceans alone. Mycobacteriophages specifically attack mycobacteria, which includes the important human pathogens that cause leprosy (Mycobacterium leprae) and tuberculosis (Mycobacterium tuberculosis), as well as the harmless Mycobacterium smegmatis (M. smeg). According to the WHO: tuberculosis is second largest killer (after HIV) by infection of a single infectious agent, and one third of the entire Earth’s human population is infected with latent tuberculosis. Background To date, over 500 mycobacteriophage genomes have been sequenced mostly through the HHMI Science Education Alliance’s Phage Hunters Advancing Genomics & Evolutionary Science program (HHMI SEA-PHAGES) in conjunction with Dr. Graham Hatfull’s University of Pittsburgh laboratory (phagesdb.org) (Pope et al., 2011). About 90% of highly conserved mycobacteriophage genes “phamilies” have no known function! Let’s use ‘Dual RNA-Seq’ to validate mycobacteriophage gene annotations, and determine the temporal pattern of gene expression during infection. Module Research Goals Use ‘Dual RNA-Seq’ to validate mycobacteriophage gene annotations. Determine the temporal pattern of gene expression of the mycobacteriophage during infection. Croucher, N.J., and Thomson, N.R. (2010). Determine the temporal pattern of gene expression of the host M. smegmatis. Identify the ‘repressor’ gene of temperate phage by analyzing a lysogen. Student Learning Goals Be able to explain host-pathogen interactions & mechanisms in a ‘simple’ bacteria-phage system. Understand the advantages of using NGS technologies (including RNA-Seq) to elucidate gene expression patterns. Be able to perform an analytic pipeline in a Galaxy environment in order to discover gene expression patterns in a ‘dual RNA-Seq’ experiment. Be able to perform and understand the statistical implications of RNA-Seq experiments. Vision and Change Core Competencies Ability to apply the process of science: Perform the analysis of a dual RNA-Seq dataset. Ability to use quantitative reasoning: Perform quantitative analysis and apply mathematical reasoning to the analysis of a RNASeq dataset. Ability to use modeling and simulation: Be able to explain the complex systems that regulate host-phage interactions Be able to run simulations of RNA-Seq datasets, and observe the effects of modifying program parameters Ability to tap into the interdisciplinary nature of science: NGS technologies represent an interdisciplinary science that intersects with physics, computer science, engineering, statistical inference, and information science. Ability to communicate and collaborate with other disciplines: Collaborate to identify the gene expression patterns of a phage and its host, and present the data to their peers Ability to understand the relationship between science and society: Understand that bacteriophage profoundly affect global ecosystems, can be used to treat GCAT-SEEK sequencing requirements The organism is the Mycobacterium smegmatis mc2 155 +/- phage infection The Mycobacterium smegmatis mc2 155 genome is a single circular chromosome of 6.99Mb with 6,742 genes, and The ABCat phage genome is 76.131Kb with 145 predicted genes. Samples would be pelleted and resuspended in the RNeasy Mini Kit (Qiagen). The suggested kit for rRNA depletion is RiboZero for Gram + bacteria (Epicentre). Need to get around 200 million reads/sample, with around 160 million reads coming from the host (M.smeg.), and depending on time point, between 0.3-3 million reads from the bacteriophage genome (therefore the phage transcripts would represent ~0.2-2% of the reads). generate single-end libraries from the TruSeq Illumina kit, without multiplexing. Computer/program requirements for data analysis Internet connection, Mac OS, Linux (Ubuntu is nice) Web browser (excluding IE) Computer programs: Galaxy (and pre-compiled RNA-Seq/NGS tools) public, local, or Amazon EC2 instance url: ‘usegalaxy.org’ R stats (if using BioConductor NGS packages) Python 2.7 (if using ‘bcbio-nextgen’ or ‘biopython’ modules) Student Assessments Pre- and post-tests for understanding of statistical methods, calculations, considerations comprehension of RNA-Seq methodology (wet-lab techniques and in silico analysis ability to explain host-pathogen interactions & mechanisms in a ‘simple’ bacteria-phage system. Assessment of student confidence in using NGS computational tools and in navigating in a Linux environment. Timeline WEEK 1: Learn to navigate and use “usegalaxy.org” (ie Galaxy) to create a workflow for the analysis of dual RNA-Seq data from Mycobacterium smegmatis mc2 155 and a mycobacteriophage. Import RNA-Seq datasets that will be received from GCAT-SEEK sequencing facility in late summer into Galaxy. Convert Genbank files for Mycobacterium smegmatis mc2 155, and the selected Myccobacteriophage to a GFF file format using the Rätsch lab’s Galaxy Instance or use the “bcbio-nextgen” Python module. Import “GTF” or “GFF3” formatted reference genomes for Mycobacterium smegmatis mc2 155 Import a fasta file of the genomic sequence of the mycobacteriophage ABCcat. WEEKS 2-4: Use the Galaxy RNA-Seq tools to maps reads to the two reference genomes. bowtie, cufflinks, rsem Discussion & Lecture Topics ELSI of NGS technologies Bacterial Host-Pathogen interactions Bacteriophage replication mechanisms Lytic versus Lysogenic lifestyles The connection between mycobacteriophage genome architecture and temporal gene expression patterns Determining gene phylogeny through sequence comparisons using bioinformatic tools References Croucher, N.J., and Thomson, N.R. (2010). Studying bacterial transcriptomes using RNA-seq. Curr Opin Microbiol 13, 619–624. Dedrick, R.M., Marinelli, L.J., Newton, G.L., Pogliano, K., Pogliano, J., and Hatfull, G.F. (2013). Functional requirements for bacteriophage growth: gene essentiality and expression in mycobacteriophage Giles. Mol Microbiol. Giardine, B., Riemer, C., Hardison, R.C., Burhans, R., Elnitski, L., Shah, P., Zhang, Y., Blankenberg, D., Albert, I., Taylor, J., et al. (2005). Galaxy: a platform for interactive large-scale genome analysis. Genome Res 15, 1451– 1455. Goecks, J., Nekrutenko, A., Taylor, J., Galaxy Team (2010). Galaxy: a comprehensive approach for supporting accessible, reproducible, and transparent computational research in the life sciences. Genome Biol 11, R86. Haas, B.J., Chin, M., Nusbaum, C., Birren, B.W., and Livny, J. (2012). How deep is deep enough for RNA-Seq profiling of bacterial transcriptomes? BMC Genomics 13, 734. Haas, B.J., and Zody, M.C. (2010). Advancing RNA-Seq analysis. Nat. Biotechnol. 28, 421–423. Hatfull, G.F. (2012). The secret lives of mycobacteriophages. Adv. Virus Res. 82, 179–288. Henry, M., and Debarbieux, L. (2012). Tools from viruses: bacteriophage successes and beyond. Virology 434, 151–161. Jacobs-Sera, D., Marinelli, L.J., Bowman, C., Broussard, G.W., Guerrero Bustamante, C., Boyle, M.M., Petrova, Z.O., Dedrick, R.M., Pope, W.H., Science Education Alliance Phage Hunters Advancing Genomics And Evolutionary Science Sea-Phages Program, et al. (2012). On the nature of mycobacteriophage diversity and host preference. Virology. Pope, W.H., Jacobs-Sera, D., Russell, D.A., Peebles, C.L., Al-Atrache, Z., Alcoser, T.A., Alexander, L.M., Alfano, M.B., Alford, S.T., Amy, N.E., et al. (2011). Expanding the diversity of mycobacteriophages: insights into genome architecture and evolution. PLoS ONE 6, e16329. Soneson, C., and Delorenzi, M. (2013). A comparison of methods for differential expression analysis of RNA-seq data. BMC Bioinformatics 14, 91 Westermann, A.J., Gorski, S.A., and Vogel, J. (2012). Dual RNA-seq of pathogen and host. Nat. Rev. Microbiol. 10, 618–630.