DNA Sequencing Notes

DNA Sequencing

Basics

  • DNA sequencing is the process of determining the nucleic acid sequence, which is the order of nucleotides in DNA.

  • A nucleic acid sequence is a succession of bases signified by a series of letters (G, A, C, T for DNA; G, A, C, U for RNA) that indicate the order of nucleotides forming alleles within a DNA or RNA molecule.

  • Nucleotides are organic molecules composed of a nitrogenous base, a pentose sugar, and a phosphate group.

  • Allele: Alternative DNA sequences at a locus

History and Development

  • Frederick Sanger's Contribution: In 1955, Sanger completed the sequence of all amino acids in insulin, a small protein secreted by the pancreas.

    • This provided the first conclusive evidence that proteins were chemical entities with a specific molecular pattern rather than a random mixture.

  • Crick's Theory (1958): Building on Sanger's work, Crick theorized that the arrangement of nucleotides in DNA determines the sequence of amino acids in proteins, which in turn determines the function of a protein.

  • RNA Sequencing: One of the earliest forms of nucleotide sequencing.

    • Walter Fiers and his coworkers identified and published the sequence of the first complete gene & the complete genome of Bacteriophage MS2 in 1972 & 1976.

    • Traditional RNA sequencing requires the creation of a cDNA molecule, which must be sequenced.

  • Ray Wu's Primer Extension Strategy (1970): Established the first method for determining DNA sequences at Cornell University.

    • Utilized DNA polymerase catalysis and specific nucleotide labeling to sequence the cohesive ends of lambda phage DNA.

    • Between 1970 and 1973, Wu, Padmanabhan, and colleagues demonstrated that this method could be used to determine any DNA sequence using synthetic location-specific primers.

  • Sanger's Refinement (1977): Sanger adopted the primer-extension strategy to develop more rapid DNA sequencing methods and published a method for "DNA sequencing with chain-terminating inhibitors."

  • Gilbert and Maxam's Method: Walter Gilbert and Allan Maxam at Harvard developed sequencing methods, including one for "DNA sequencing by chemical degradation."

    • In 1973, Gilbert & Maxam reported the sequence of 24 basepairs using a method known as wandering-spot analysis.

  • Recombinant DNA Technology: Advancements in sequencing were aided by the concurrent development of recombinant DNA technology, which allowed DNA samples to be isolated from sources other than viruses.

  • First Full DNA Genome Sequenced (1977): The first full DNA genome to be sequenced was that of bacteriophage φX174, consisting of 5,386 base pairs.

  • Epstein-Barr Virus (1984): Medical Research Council scientists deciphered the complete DNA sequence of the Epstein-Barr virus, finding it contained 172,282 nucleotides.

    • Completion of the sequence marked a significant turning point in DNA sequencing because it was achieved with no prior genetic profile knowledge of the virus

Significance and Applications

  • Every organism's DNA consists of a unique sequence of nucleotides.

  • Determining the sequence can help scientists compare DNA between organisms, showing how the organisms are related.

  • DNA sequencing includes any method or technology used to determine the order of the four bases: adenine, guanine, cytosine, and thymine.

  • Applications:

    • Determine the sequence of individual genes, larger genetic regions (clusters of genes or operons), full chromosomes, and entire genomes of any organism.

    • Most efficient way to indirectly sequence RNA or proteins (via their open reading frames).

Key Applications

  • Biology and Other Sciences: DNA sequencing has become a key technology in many areas of biology, forensics, anthropology, and medicine.

  • Molecular Biology: Used to study genomes and the proteins they encode, identify changes in genes and noncoding DNA (including regulatory sequences), associations with diseases and phenotypes, and identify potential drug targets.

  • Evolutionary Biology: Used to study how different organisms are related and how they evolved, since DNA is an informative macromolecule in terms of transmission from one generation to another.

  • Historical Sequencing: In February 2021, scientists reported, for the first time, the sequencing of DNA from animal remains, specifically a mammoth, over a million years old, the oldest DNA sequenced to date.

Cost Reduction

  • The first full sequence of human DNA cost around 3billion3 billion. Now, certain companies sequence entire genomes for less than 1,0001,000.

Main Types of DNA Sequencing

  • Sanger method (classical chain termination method).

  • High-Throughput Sequencing (HTS) techniques or Next-Generation Sequencing (NGS) methods.

Sanger Method (Chain Termination Method)

  • Relies on a primer that binds to a denatured DNA molecule and initiates the synthesis of a single-stranded polynucleotide in the presence of a DNA polymerase enzyme, using the denatured DNA as a template.

  • The enzyme catalyzes the addition of a nucleotide, forming a covalent bond between the 3' carbon atom of the deoxyribose sugar molecule in one nucleotide and the 5' carbon atom of the next.

  • The sequencing reaction mixture contains a small proportion of modified nucleotides that cannot form this covalent bond due to the absence of a reactive hydroxyl group (dideoxyribonucleotides).

  • These dideoxyribonucleotides (ddNTPs) lack a 2' or 3' oxygen atom compared to the corresponding ribonucleotide, causing premature termination of the DNA polymerization reaction.

  • Multiple rounds of polymerization create a mixture of molecules of varying lengths.

  • Early Attempts: DNA molecule was amplified using a labeled primer and then split into four test tubes, each having only one type of ddNTP.

  • Each reaction mixture would have only one type of modified nucleotide that could cause chain termination.

  • After the four reactions, the mixture of DNA molecules created by chain termination would undergo electrophoresis on a polyacrylamide gel and get separated according to their length.

  • Dye-Terminator Sequencing: A modified method where each ddNTP has a different fluorescent label.

  • The primer is no longer the source of the radiolabel or fluorescent tag.

  • Uses four dyes with non-overlapping emission spectra, one for each ddNTP.

  • Process:

    • Single reaction mixture carries all elements needed for DNA elongation + small concentrations of 4 ddNTPs, each with a different fluorescent tag.

    • Completed reaction is run on a capillary gel.

    • Results are obtained through an analysis of the emission spectra from each DNA band on the gel.

    • A software program analyzes the spectra and presents the sequence of the DNA molecule.

Sanger Sequencing - Usefulness and Limitations

  • Sanger sequencing continues to be useful for determining the sequences of relatively long stretches of DNA, especially at low volumes.

  • It can become expensive and laborious when a large number of molecules need to be sequenced quickly.

  • High-throughput methods have become more widely used, especially when entire genomes need to be sequenced.

High-Throughput Sequencing (HTS) or Next-Generation Sequencing (NGS)

  • Steps: Extraction, Library Fragmentation Prep, Sequencing, Analysis

  • Analysis yields FASTQ files, read alignment, variant identification, and BAM/VCF files.

  • Major Changes Compared to Sanger Method

    • (1) Development of a cell-free system for cloning DNA fragments: Traditionally, the stretch of DNA that needed to be sequenced was first cloned into a prokaryotic plasmid & amplified within bacteria before being extracted & purified. High throughput sequencing or next-generation sequencing technologies no longer relied on this labor-intensive & time-intensive procedure.

    • (2) parallel processing: these methods created space to run millions of sequencing reactions in parallel. This was a huge step forward from the initial methods where eight different reaction mixtures were needed to produce a single reliable nucleotide sequence.

    • (3) bases identified as sequencing proceeds: there is no separation between the elongation and detection steps.

  • HTS decreased cost and time, but their ‘reads’ were relatively short. To assemble an entire genome, intense computation is necessary.

Uses of HTS

  • Molecular biology: HTS technologies are used to study variations in the genetic compositions of plasmids, bacteria, yeast, nematodes, or even mammals used in laboratory experiments.

  • Diagnostics: Identifying the causes of rare genetic disorders, important player in developing a greater understanding of tumors and cancers.

  • Forensics