Cells, DNA, RNA and Proteins Simplified! 1 Cells • The fundamental unit of life is the cell • A cell consists of a protective membrane surrounding a collection of organelles (subcellular structures) and large and complex molecules that provide cellular structure, energy, and the means for the cell to reproduce • In plants and animals, individual cells cooperate to form multicellular tissues and organ systems that meet the biological needs of the organism • We are interested in biological sequences that regulate all biological processes in cells and organisms • Our primary concern are the instructions for the organization of cells during the development of an organism DNA • The instruction sequences are stored in very long chemical strings called DNA • DNA is the main information carrier molecule in a cell • DNA may be single or double stranded. • A single stranded DNA molecule, also called a polynucleotide, is a chain of small molecules, called nucleotides. • There are four different nucleotides grouped into two types, – purines: adenine and guanine and – pyrimidines: cytosine and thymine. • They are usually referred to as bases and denoted by their initial letters, A, C, G and T DNA • Different nucleotides can be linked together in any order to form a polynucleotide, for instance, like this A-G-T-C-C-A-A-G-C-T-T • Polynucleotides can be of any length and can have any sequence • The two ends of this molecule are chemically different, i.e., the sequence has a directionality, like this A->G->T->C->C->A->A->G->C->T->T-> • The end of the polynucleotides are marked either 5' and 3' . • By convention DNA is usually written with 5' left and 3' right, with the coding strand at top. DNA • Two strands are said to be complementary if one can be obtained from the other by – mutually exchanging A with T and C with G, and – changing the direction of the molecule to the opposite. A->G->T->C->C->A->A->G->C->T->T-> <-T<-C<-A<-G<-G<-T<-T<-C<-G<-A<-A DNA • Specific pairs of nucleotides can form weak bonds between them • A binds to T, C binds to G. • Although such interactions are individually weak, when two longer complementary polynucleotide chains meet, they tend to stick together 5' C-G-A-T-T-G-C-A-A-C-G-A-T-G-C 3' | | | | | | | | | | | | | | | 3' G-C-T-A-A-C-G-T-T-G-C-T-A-C-G 5' • Vertical lines between two strands represent the forces between them as shown above. • The A-T and G-C pairs are called base-pairs (bp). • The length of a DNA molecule is usually measured in base-pairs or nucleotides (nt), which in this context is the same thing. DNA Double Helix Two complementary polynucleotide chains form a stable structure, which resembles a helix known as a the DNA double helix. About 10 bp in this structure takes a full turn, which is about 3.4 nm long. DNA • It is remarkable that two complementary DNA polypeptides form a stable double helix almost regardless of the sequence of the nucleotides • This makes the DNA molecule a perfect medium for information storage • Note that as the strands are complementary, either one of the strands of the genome molecule contains all the informatiion • Thus, for many information related purposes, the molecule used on the example above can be represented as CGATTCAACGATGC • The maximal amount of information that can be encoded in such a molecule is therefore 2 bits times the length of the sequence • Noting that the distance between nucleotide pairs in a DNA is about 0.34 nm, we can calculate that the linear information storage density in DNA is about 6x10 8 bits/cm • Which is approximately 75 GB DNA • Regions in the DNA sequence encode instructions for the manufacture of proteins in the cell • Proteins are linear chains whose elements come from a set of 20 chemically active building blocks known as amino acids. • Each protein has a unique sequence of amino acids that is determined by a DNA sequence on the chromosomes. • The proteins enable an organism to build needed structures and to carry out its biological functions. • Using a specific biological mechanism – transcription – the DNA is “read” and searched for specific patterns that mark the beginning and end of hereditary information • That information is the gene RNA • Transcription produces another long string called messenger RNA (mRNA) • The mRNA is what actually specifies the amino acid sequence. • mRNA molecules are very similar structurally and chemically to DNA • Exceptions: they are single-stranded and have a new base – uracil (M) – instead of thymine (T). It also has a different backbone sugar. Translation • mRNA also has specific regions indicating the start of the code for a protein • Large organelles in the cytoplasm (ribosomes) bind to the start sites • Then move in a defined chemical direction , reading length-three base sequences (codons) at a time • Each codon specifies an amino acid • The corresponding amino acid is then added to a growing chain that comprise the protein • This continues until one of several stop codons is reached Genetic Code • Dictionaries are the natural Python representation of tabular data. • Next time, we will illustrate this with a representation of the codon table for protein synthesis 11 Transcription and Translation • Once formed, proteins rapidly fold from a linear string into simple helical and stranded elements • These new components are then organized into a complex threedimensional structure • The resulting protein molecule may serve as a tissue building block or have a very specific chemical activity • The collection of proteins produced by an organism, the proteome, is responsible for the organism’s structure and biological behavior.