Cells, DNA, RNA and Proteins
• The fundamental unit of life is the cell
• A cell consists of a protective membrane surrounding a collection of
organelles (subcellular structures) and large and complex molecules that
provide cellular structure, energy, and the means for the cell to reproduce
• In plants and animals, individual cells cooperate to form multicellular
tissues and organ systems that meet the biological needs of the organism
• We are interested in biological sequences that regulate all biological
processes in cells and organisms
• Our primary concern are the instructions for the organization of cells
during the development of an organism
• The instruction sequences are stored in very long chemical strings called
• DNA is the main information carrier molecule in a cell
• DNA may be single or double stranded.
• A single stranded DNA molecule, also called a polynucleotide, is a chain of
small molecules, called nucleotides.
• There are four different nucleotides grouped into two types,
– purines: adenine and guanine and
– pyrimidines: cytosine and thymine.
• They are usually referred to as bases and denoted by their initial letters,
A, C, G and T
• Different nucleotides can be linked together in any order to form a
polynucleotide, for instance, like this
• Polynucleotides can be of any length and can have any sequence
• The two ends of this molecule are chemically different, i.e., the sequence
has a directionality, like this
• The end of the polynucleotides are marked either 5' and 3' .
• By convention DNA is usually written with 5' left and 3' right, with the
coding strand at top.
• Two strands are said to be complementary if one can be obtained from the
other by
– mutually exchanging A with T and C with G, and
– changing the direction of the molecule to the opposite.
• Specific pairs of nucleotides can form weak bonds between them
• A binds to T, C binds to G.
• Although such interactions are individually weak, when two longer
complementary polynucleotide chains meet, they tend to stick together
5' C-G-A-T-T-G-C-A-A-C-G-A-T-G-C 3'
| | | | | | | | | | | | | | |
3' G-C-T-A-A-C-G-T-T-G-C-T-A-C-G 5'
• Vertical lines between two strands represent the forces between them as
shown above.
• The A-T and G-C pairs are called base-pairs (bp).
• The length of a DNA molecule is usually measured in base-pairs or
nucleotides (nt), which in this context is the same thing.
DNA Double Helix
Two complementary polynucleotide chains
form a stable structure, which resembles a helix
known as a the DNA double helix.
About 10 bp in this structure takes a full turn,
which is about 3.4 nm long.
• It is remarkable that two complementary DNA polypeptides form a stable
double helix almost regardless of the sequence of the nucleotides
• This makes the DNA molecule a perfect medium for information storage
• Note that as the strands are complementary, either one of the strands of
the genome molecule contains all the informatiion
• Thus, for many information related purposes, the molecule used on the
example above can be represented as CGATTCAACGATGC
• The maximal amount of information that can be encoded in such a
molecule is therefore 2 bits times the length of the sequence
• Noting that the distance between nucleotide pairs in a DNA is about 0.34
nm, we can calculate that the linear information storage density in DNA is
about 6x10 8 bits/cm
• Which is approximately 75 GB
• Regions in the DNA sequence encode instructions for the manufacture of
proteins in the cell
• Proteins are linear chains whose elements come from a set of 20
chemically active building blocks known as amino acids.
• Each protein has a unique sequence of amino acids that is determined by a
DNA sequence on the chromosomes.
• The proteins enable an organism to build needed structures and to carry
out its biological functions.
• Using a specific biological mechanism – transcription – the DNA is “read”
and searched for specific patterns that mark the beginning and end of
hereditary information
• That information is the gene
Transcription produces another long string called messenger RNA (mRNA)
The mRNA is what actually specifies the amino acid sequence.
mRNA molecules are very similar structurally and chemically to DNA
Exceptions: they are single-stranded and have a new base – uracil (M) –
instead of thymine (T). It also has a different backbone sugar.
mRNA also has specific regions indicating the start of the code for a protein
Large organelles in the cytoplasm (ribosomes) bind to the start sites
Then move in a defined chemical direction , reading length-three base
sequences (codons) at a time
Each codon specifies an amino acid
The corresponding amino acid is then added to a growing chain that comprise
the protein
This continues until one of several stop codons is reached
Genetic Code
Dictionaries are the natural Python representation of tabular data.
Next time, we will illustrate this with a representation of the codon table for protein
Transcription and Translation
• Once formed, proteins rapidly fold from a linear string into simple helical
and stranded elements
• These new components are then organized into a complex threedimensional structure
• The resulting protein molecule may serve as a tissue building block or have
a very specific chemical activity
• The collection of proteins produced by an organism, the proteome, is
responsible for the organism’s structure and biological behavior.

similar documents