DNA Sequencing Notes
DNA Sequencing
Basics
DNA sequencing is the process of determining the nucleic acid sequence, which is the order of nucleotides in DNA.
A nucleic acid sequence is a succession of bases signified by a series of letters (G, A, C, T for DNA; G, A, C, U for RNA) that indicate the order of nucleotides forming alleles within a DNA or RNA molecule.
Nucleotides are organic molecules composed of a nitrogenous base, a pentose sugar, and a phosphate group.
Allele: Alternative DNA sequences at a locus
History and Development
Frederick Sanger's Contribution: In 1955, Sanger completed the sequence of all amino acids in insulin, a small protein secreted by the pancreas.
This provided the first conclusive evidence that proteins were chemical entities with a specific molecular pattern rather than a random mixture.
Crick's Theory (1958): Building on Sanger's work, Crick theorized that the arrangement of nucleotides in DNA determines the sequence of amino acids in proteins, which in turn determines the function of a protein.
RNA Sequencing: One of the earliest forms of nucleotide sequencing.
Walter Fiers and his coworkers identified and published the sequence of the first complete gene & the complete genome of Bacteriophage MS2 in 1972 & 1976.
Traditional RNA sequencing requires the creation of a cDNA molecule, which must be sequenced.
Ray Wu's Primer Extension Strategy (1970): Established the first method for determining DNA sequences at Cornell University.
Utilized DNA polymerase catalysis and specific nucleotide labeling to sequence the cohesive ends of lambda phage DNA.
Between 1970 and 1973, Wu, Padmanabhan, and colleagues demonstrated that this method could be used to determine any DNA sequence using synthetic location-specific primers.
Sanger's Refinement (1977): Sanger adopted the primer-extension strategy to develop more rapid DNA sequencing methods and published a method for "DNA sequencing with chain-terminating inhibitors."
Gilbert and Maxam's Method: Walter Gilbert and Allan Maxam at Harvard developed sequencing methods, including one for "DNA sequencing by chemical degradation."
In 1973, Gilbert & Maxam reported the sequence of 24 basepairs using a method known as wandering-spot analysis.
Recombinant DNA Technology: Advancements in sequencing were aided by the concurrent development of recombinant DNA technology, which allowed DNA samples to be isolated from sources other than viruses.
First Full DNA Genome Sequenced (1977): The first full DNA genome to be sequenced was that of bacteriophage φX174, consisting of 5,386 base pairs.
Epstein-Barr Virus (1984): Medical Research Council scientists deciphered the complete DNA sequence of the Epstein-Barr virus, finding it contained 172,282 nucleotides.
Completion of the sequence marked a significant turning point in DNA sequencing because it was achieved with no prior genetic profile knowledge of the virus
Significance and Applications
Every organism's DNA consists of a unique sequence of nucleotides.
Determining the sequence can help scientists compare DNA between organisms, showing how the organisms are related.
DNA sequencing includes any method or technology used to determine the order of the four bases: adenine, guanine, cytosine, and thymine.
Applications:
Determine the sequence of individual genes, larger genetic regions (clusters of genes or operons), full chromosomes, and entire genomes of any organism.
Most efficient way to indirectly sequence RNA or proteins (via their open reading frames).
Key Applications
Biology and Other Sciences: DNA sequencing has become a key technology in many areas of biology, forensics, anthropology, and medicine.
Molecular Biology: Used to study genomes and the proteins they encode, identify changes in genes and noncoding DNA (including regulatory sequences), associations with diseases and phenotypes, and identify potential drug targets.
Evolutionary Biology: Used to study how different organisms are related and how they evolved, since DNA is an informative macromolecule in terms of transmission from one generation to another.
Historical Sequencing: In February 2021, scientists reported, for the first time, the sequencing of DNA from animal remains, specifically a mammoth, over a million years old, the oldest DNA sequenced to date.
Cost Reduction
The first full sequence of human DNA cost around . Now, certain companies sequence entire genomes for less than .
Main Types of DNA Sequencing
Sanger method (classical chain termination method).
High-Throughput Sequencing (HTS) techniques or Next-Generation Sequencing (NGS) methods.
Sanger Method (Chain Termination Method)
Relies on a primer that binds to a denatured DNA molecule and initiates the synthesis of a single-stranded polynucleotide in the presence of a DNA polymerase enzyme, using the denatured DNA as a template.
The enzyme catalyzes the addition of a nucleotide, forming a covalent bond between the 3' carbon atom of the deoxyribose sugar molecule in one nucleotide and the 5' carbon atom of the next.
The sequencing reaction mixture contains a small proportion of modified nucleotides that cannot form this covalent bond due to the absence of a reactive hydroxyl group (dideoxyribonucleotides).
These dideoxyribonucleotides (ddNTPs) lack a 2' or 3' oxygen atom compared to the corresponding ribonucleotide, causing premature termination of the DNA polymerization reaction.
Multiple rounds of polymerization create a mixture of molecules of varying lengths.
Early Attempts: DNA molecule was amplified using a labeled primer and then split into four test tubes, each having only one type of ddNTP.
Each reaction mixture would have only one type of modified nucleotide that could cause chain termination.
After the four reactions, the mixture of DNA molecules created by chain termination would undergo electrophoresis on a polyacrylamide gel and get separated according to their length.
Dye-Terminator Sequencing: A modified method where each ddNTP has a different fluorescent label.
The primer is no longer the source of the radiolabel or fluorescent tag.
Uses four dyes with non-overlapping emission spectra, one for each ddNTP.
Process:
Single reaction mixture carries all elements needed for DNA elongation + small concentrations of 4 ddNTPs, each with a different fluorescent tag.
Completed reaction is run on a capillary gel.
Results are obtained through an analysis of the emission spectra from each DNA band on the gel.
A software program analyzes the spectra and presents the sequence of the DNA molecule.
Sanger Sequencing - Usefulness and Limitations
Sanger sequencing continues to be useful for determining the sequences of relatively long stretches of DNA, especially at low volumes.
It can become expensive and laborious when a large number of molecules need to be sequenced quickly.
High-throughput methods have become more widely used, especially when entire genomes need to be sequenced.
High-Throughput Sequencing (HTS) or Next-Generation Sequencing (NGS)
Steps: Extraction, Library Fragmentation Prep, Sequencing, Analysis
Analysis yields FASTQ files, read alignment, variant identification, and BAM/VCF files.
Major Changes Compared to Sanger Method
(1) Development of a cell-free system for cloning DNA fragments: Traditionally, the stretch of DNA that needed to be sequenced was first cloned into a prokaryotic plasmid & amplified within bacteria before being extracted & purified. High throughput sequencing or next-generation sequencing technologies no longer relied on this labor-intensive & time-intensive procedure.
(2) parallel processing: these methods created space to run millions of sequencing reactions in parallel. This was a huge step forward from the initial methods where eight different reaction mixtures were needed to produce a single reliable nucleotide sequence.
(3) bases identified as sequencing proceeds: there is no separation between the elongation and detection steps.
HTS decreased cost and time, but their ‘reads’ were relatively short. To assemble an entire genome, intense computation is necessary.
Uses of HTS
Molecular biology: HTS technologies are used to study variations in the genetic compositions of plasmids, bacteria, yeast, nematodes, or even mammals used in laboratory experiments.
Diagnostics: Identifying the causes of rare genetic disorders, important player in developing a greater understanding of tumors and cancers.
Forensics