Bioinformatics Lectures at Rice

Bioinformatics Lectures at Rice
Lecture 2: High throughput
technologies in genomics
By Li Zhang
•Biology: The biological problems
•Technology: Microarray mechanism;
experimental procedures
•Statistical methods: data analysis, checking
quality, exploration, discovery.
Microarray technology
• Microarray technology measure copy
number of molecules in a mixture on a small
• Thousands or millions of different kinds of
molecules can be measured simultaneously,
thus creating large volumes of data per
biological sample.
• The molecules can be DNA, RNA or protein.
Major types of microarrays
• Two color short oligo arrays
• Single color short oligo arrays
Synthesized by photolithography: (Eric Lander)
• Bead arrays
The experimental procedure to
produce microarray data
Affymetrix Gene expression Analysis Sample
preparation protocol:
RNA isolation
cDNA synthesis
cRNA synthesis
Targets of Microarray measurements
• mRNA gene expression
• SNP genotyping
• DNA copy number (aneuploidy, chromosomal
• DNA methylation
• ChIP-chip. Protein-DNA binding site
• Nucleosome binding site
Some key aspects of microarray technology
•Parellel. The technology is design to measure a larger number of different
•Almost comprehensive. It can work for some or most of the molecules,
but not for all, which will result in some missing data.
•Noise and bias. The signals can be affected by unwanted source, e.g.,
cross-hybridization, which creates biases. Contamination also may have
asymmetrical distribution.
•Nonlinear response. Saturation causes non-linear behavior.
•Evolving annotation. Identity of the molecules may change, reflecting new
knowledge through time.
•No units. The numbers are often on relative scale, which means the data
have are not been calibrated.
Next generation sequencing
Sequence by synthesis on an array
• Illumina/SOLiD/454 Life sciences (1.5 hr video,
from a meeting in 2010)
Illumina’s animation.
( (3
Solid’s animation.
Complete Genomics ( Nanoball sequencing).
Nano-ball of Complete Genomics
Some key aspects of next generation
sequencing technology
• Compared with microarrays, NGS has less noise,
no cross hybridization, and no saturation.
• Bias remains a problem. Some sequences simply
cannot be dealt with properly. These include high
GC sequences, repeats, etc.
• Mapping to the genome can be challenging. But
paired-ends help a lot.
• Biases partly come from PCR amplification,
whose efficiency differ depending on the
3rd Generation sequencing
• Single molecule, with no PCR amplification.
• No fluorescence dyes, hence less reagent cost.
• Longer sequences
• Remaining problem: erratic base calling.
Ion torrent (
Pacific Biosciences
Challenges ahead
• Complexity of human diseases
• Heterogeneity
• Biological samples are fragile, subject to
degradation, contamination.
• Biases, batch effects, standards.

similar documents