Single Molecule Sequencing

Report
Next generation sequencing: an
overview
A I Bhat
Indian Institute of Spices Research
Calicut
DNA sequencing
• Chain termination method • The chemical degradation
(Sangers et al., 1977): In
method (Maxum and
this method, the sequence
Gilbert, 1977), in which
of a single stranded DNA
the sequence of a double
molecule is determined by
stranded DNA molecule is
enzymatic synthesis of
determined by treatment
complementary
with chemicals that cut
polynucleotide chains,
the molecule at specific
these chains terminating at nucleotide positions
specific nucleotide
positions.
Chain termination method
Dye-terminator sequencing
• Utilizes labelling of the chain terminator ddNTPs, which
permits sequencing in a single reaction
• Each of the four dideoxynucleotide chain terminators is
labelled with different fluorescent dyes (ddA Green, ddT
Red, ddG Yellow and ddC Blue), each of which with
different wavelengths of fluorescence and emission.
• The fragment stopping at the base position can be
detected on the gel by a powerful laser beam.
• Owing to its greater expediency and speed, dyeterminator sequencing is now the mainstay in automated
sequencing.
Capillary electrophoresis
View of dyeterminator
read
Sanger method can sequence only 1000–1200 bp in one reaction
Genome sequencing
1970s: Bacteriophage
1995, the bacterium Haemophilus influenzae
Followed by several other bacteria and archaea
The first eukaryotic chromosome sequence in 1992: yeast
Many eukaryotes several plants and their pathogens
2006: Human genome
Until 2006, all genome sequencing used Sanger chemistry
Shotgun sequencing
Human Genome Project
Genomic DNA is enzymatically or mechanically broken down
Cloned into sequencing vectors
Sequenced individually
Numerous fragments of DNA sequenced –BIRTH OF GENOME
INFORMATICS AND NEXT GENERATION SEQUENCING
Whole genome sequencing
The core philosophy of massive parallel sequencing used in next-generation
sequencing (NGS) is adapted from shotgun sequencing
NGS -breaking the entire genome into small pieces
Ligating DNA to designated adapters
DNA synthesis (sequencing-by-synthesis)
massively parallel sequencing
Coverage (number of short reads that overlap each other within a specific
genomic region)
Sufficient coverage is critical for accurate assembly of the genomic
sequence.
To ensure the correct identification of genetic variants, short-read coverage
of at least 30× is recommended in whole-genome scans
(Zhang et al., 2011. J Genet Genomics, 38:95-109)
Next generation sequencing
• Enables a genome to be sequenced within hours to days.
• The 454 FLX Pyrosequencer from Roche Applied Sciences was the first
next-generation sequencer to become commercially available in 2004,
• The Solexa 1G Genetic Analyzer from Illumina was commercialized
2006
• SOLiD (Supported Oligonucleotide Ligation and Detection) System from
Applied Biosystems launched in 2007
Next-next generation or third generation
sequencing
• Single molecule sequencing
Platforms on NGS technologies
Technology
Amplification
Read
length
Throughput Sequence by synthesis
Currently available
Roche/GS-FLX Titanium
Illumina/HiSeq 2000, HiScan
ABI/SOLiD 5500xl
Emulsion PCR
400-600
bp
Bridge PCR (Cluster 2 x 100
PCR)
bp
50-100
Emulsion PCR
bp
500
Mbp/run
200
Gbp/run
>100
Gbp/run
8-10
Gbp/run
21e37
Gbp/run
Pyrosequencing
Reversible terminators
Sequencing-by-ligation
(octamers)
Sequencing-by-ligation
(monomers)
True single-molecule
sequencing (tSMS)
Polonator/G.007
Emulsion PCR
26 bp
Helicos/Heliscope
No
35 (2555) bp
Pacific BioSciences/RS
No
1000 bp N/A
Visigen Biotechnologies
No
U.S. Genomics
No
>100
Kbp
N/A
No
N/A
N/A
Oxford Nanopore Technologies
No
35 bp
N/A
NABsys
No
N/A
N/A
Single-molecule mapping
Single-molecule
sequencing by synthesis
Nanopores/exonucleasecoupled
Nanopores
Electronic BioSciences
No
N/A
N/A
Nanopores
In development
Genovoxx
Single-molecule real time
(SMRT)
N/A
N/A
No
400 Kbp N/A
GE Global Research
No
N/A
N/A
IBM
LingVitae
Complete Genomics
base 4 innovation
CrackerBio
Reveo
Intelligent BioSystems
lLightSpeed Genomiics
No
No
No
No
No
No
No
N/A
N/A
70 bp
N/A
N/A
N/A
N/A
N/A
N/A
N/A
N/A
N/A
N/A
N/A
N/A
Direct-read
sequencing
by EM
BioNanomatrix/nano analyzer
Nanochannel arrays
Closed
complex/nanoparticle
Nanopores
Nanopores
DNA nanoball arrays
Nanostructure arrays
Nanowells
Nano-knife edge
Electronics
Next (2nd) generation platforms
3130XL
Applied Biosystem
GS-FLX-Titanium
Roche
Genome Analyser
Illumina
SOLiD
Applied Biosystem
700bpx96
400bp x1 million
100bp x 2 billion
50bp x 2.4 billion
Specific targets
De novo sequencing
Re-sequencing
Re-sequencing
(can de novo sequencing)
(can de novo sequencing)
(PCR products,clones)
Roche GS-FLX 454 Genome Sequencer
Longest short reads (600 bp) among all the NGS platforms
Generates ~400–600 Mb of sequence reads per run
de novo assembly of microbes in metagenomics
Raw base accuracy reported is very good (over 99%)
Chemistry
• Nucleotide incorporation releases pyrophosphate (PPi)
• ATP sulfurylase quantitatively converts PPi to ATP in the
presence of adenosine 5´ phosphosulfate.
• This ATP acts as fuel to the luciferase-mediated conversion of
luciferin to oxyluciferin that generates visible light in amounts
that are proportional to the amount of ATP.
• The light produced in the luciferase-catalyzed reaction is
detected by a camera and analyzed in a program.
• Unincorporated nucleotides and ATP are degraded by the
apyrase, and the reaction can restart with another nucleotide.
Illumina/Solexa Genome Analyzer
Superior data quality and proper read lengths have made it the system of
choice for many genome sequencing projects.
Majority of published NGS papers used Genome Analyzer.
uses a proprietary reversible terminator-based method that
enables detection of single bases as they are incorporated into
growing DNA strands
A fluorescently-labeled terminator is imaged as each dNTP is
added and then cleaved to allow incorporation of the next base.
Since all four reversible terminator-bound dNTPs are present
during each sequencing cycle, natural competition minimizes
incorporation bias.
The end result is true base-by-base sequencing that enables the
industry’s most accurate data for a broad range of applications.
Adapted from Richard Wilson, School of Medicine, Washington University, “Sequencing the Cancer Genome” http://tinyurl.com/5f3alk
Solexa-based Whole Genome Sequencing
ABI SOLiD platform
The latest model, 5500×l solid system (previously known as SOLiD4hq)
Can generate over 2.4 billion reads per run with a raw base accuracy of
99.94%
The SOLiD4 platform probably provides the best data quality as a result
of its sequencing-by-ligation approach but the DNA library preparation
procedures prior to sequencing can be tedious and time consuming.
Preferred for Re-sequencing than denovo sequencing.
(Zhang et al., 2011)
Next generation sequencing using Roche 454
Sample Preparation
Nucleic acid isolation
Double-stranded cDNA synthesis
Rapid library preparation
Fragmentation (Nebulization/ shearing) into smaller sized
fragments of 400 to 1000 bp
Addition of adopters
Remove small fragment (<300 bp)
Library Quality Assessment
Emulsion based clonal amplification (emPCR)
•
Preparation of reagents and of emulsion oil
•
Preparation of amplification mix (addition of additive, amplification
mix, primers, enzyme mix and PPiase)
•
DNA library capture (one molecule of DNA per bead and one bead
per aqueous microreactor to be insulated from other beads by
surrounding oil.
•
Emulsification (shaking captured library to form a water–in-oil
mixture)
•
Amplification (emulsified beads are clonally amplified)
•
Bead recovery and enrichment
Sequencing
Clonally amplified fragments loaded onto a PicoTiter Plate device for
sequencing (diameter of Plate wells allow only one bead per well)
After addition of sequencing enzymes, fluidics subsystem of
sequencing instrument flows individual nucleotides in a fixed order
across all wells
Addition of one (or more) nucleotide(s) complementary to the
template strand results in a chemiluminescent signal recorded by
the CCD camera within the instrument
During nucleotide flow, thousands of beads each carrying millions
of copies of ss DNA molecule are sequenced in parallel
Each 10-h sequencing run will typically produce over 1,000,000
flowgrams (one flowgram per bead)
Base calling (to check quality of each read)
Trimming primer sequence
Production of contigs
NGS platform under development (3rd Generation sequencers)
Aim single DNA molecule sequencing (without amplification)
Provides accurate data with long reads
i)
Flouresence based single molecule sequencing (Pacific Biosciences;
US Genomics)
ii) Nano technologies for single molecule sequencing (Oxford Nanopore
technologies, Nabsys, BioNanomatrix, Electronic Biosciences,
Cracker Bio)
iii) Electronic detection for single molecule sequencing (Reveo, Intelligent
Biosystems)
iv) Electron microscopy for single molecule sequencing (Light speed
genomics, Halcyon Molecular, ZS Genetics)
Single Molecule Sequencing
(Helicos Biosciences, USA)
Billions of single molecules of sample DNA are captured on an applicationspecific proprietary surface serve as templates for the sequencing-by-synthesis
Polymerase and one fluorescently labeled nucleotide (C, G, A or T) are added.
The polymerase catalyzes the sequence-specific incorporation of fluorescent
nucleotides into nascent complementary strands on all the templates.
After a wash step, which removes all free nucleotides, the incorporated
nucleotides are imaged and their positions recorded.
The fluorescent group is removed in a highly efficient cleavage process, leaving
behind the incorporated nucleotide.
The process continues through each of the other three bases.
Multiple four-base cycles result in complementary strands greater than 25 bases
in length synthesized on billions of templates—providing a greater than 25-base
read from each of those individual templates.
Single
Molecule
Sequencing
(Helicos
Biosciences,
USA)
Ion Sequencing
(Rothberg et al., Life technologies: Nature, July 2011)
Non-optical method of DNA sequencing of genomes
Sequence data obtained by directly sensing the ions produced by
template-directed DNA polymerase synthesis using all-natural
nucleotides on this massively parallel semiconductor-sensing device
or ion chip
The ion chip contains ion-sensitive, 1.2 million wells, which provide
confinement and allow parallel, simultaneous detection of
independent sequencing reactions.
Performance of the system showed by sequencing three bacterial
and one human genome
World’s smallest solid state pH meter
DNA is fragmented, ligated to adapters, and clonally amplified onto beads.
Sequencing primers and DNA polymerase are then bound to the templates and pipetted
into the chip’s loading port. Individual beads are loaded into individual sensor wells by
spinning. Well depth will allow only a single bead to occupy a well
All four nucleotides are provided in a stepwise fashion during an automated run. When
nucleotide in the flow is complementary to the template base directly downstream of the
sequencing primer, the nucleotide is incorporated into the nascent strand by the bound
polymerase.
This increases length of sequencing primer by one base (or more, if a homopolymer
stretch is directly downstream of the primer) and results in the hydrolysis of the incoming
nucleotide triphosphate, which causes the net liberation of a single proton for each
nucleotide incorporated during that flow.
Release of proton produces a shift in pH of surrounding solution proportional to the no. of
nucleotides incorporated in the flow (0.02 pH units per single base incorporation). This is
detected by the sensor on the bottom of each well, converted to a voltage and digitized
by off-chip electronics . The signal generation and detection occurs over 4 s
After the flow of each nucleotide, a wash is used to ensure nucleotides do not remain in
the well.
Sequencing methods
Mining NGS data to obtain meaningful
information
Average NGS experiment generates gigabytes to terabytes of raw data
Existing bioinformatics tools functions fit into several general categories:
(1) alignment of reads to a reference sequence (2) de novo assembly (3)
reference-based assembly (4) genetic variation detection (such as SNV,
Indel) (5) genome annotation (6) utilities for data analysis.
The most important step in NGS data analysis is successful assembly or
alignment of reads to a reference genome.
After successful alignment and assembly the next step is to interpret the
large number of putative novel genetic variants (or mutations) present
by chance
Recognition of functional variants is at the center of the NGS data
analysis and bioinformatics
Thanks

similar documents