DNA Cloning and Sequencing Notes

Cloning DNA Fragments/Genes & Recombinant DNA Technology

  • Recombinant DNA technology uses in vitro molecular techniques to isolate and manipulate DNA fragments.
  • This technology is used to find genes in genomes.

Cloning

  • Cloning is the generation or production of identical copies of molecules (e.g., DNA), cells, or organisms.
  • Gene cloning is the technique of isolating and making many copies of a gene.

Cloning of DNA

  • Involves:
    • Restriction endonucleases
    • Vectors

Restriction Endonucleases

  • Enzymes that recognize specific sequences in double-stranded DNA (dsDNA) and cleave the DNA.
  • In the 1960s, it was discovered that certain E. coli hosts cleave phage DNA, restricting phage growth.
  • Example: HindII from Haemophilus influenzae cleaves T7 phage into 40 specific fragments.

Restriction Enzymes and Genome Fragmentation

  • Each restriction enzyme recognizes a specific sequence of bases anywhere within the genome.
  • They cut the sugar-phosphate backbones of both strands.
  • Restriction fragments are generated by digestion of DNA with restriction enzymes.
  • Hundreds of restriction enzymes are now available.
  • Recognition sites are usually 4-8 base pairs (bp) of double-stranded DNA.
  • Often palindromic: base sequences of each strand are identical when read 5'-to-3'.
    • Example: 5'-GAATTC-3' and 3'-CTTAAG-5'
  • Each enzyme cuts at the same place relative to its specific recognition sequence.

Commonly Used Restriction Enzymes

EnzymeSequence of Recognition SiteMicrobial Origin
TaqI5'-TCGA-3'Thermus aquaticus YT1
RsaI5'-GTAC-3'Rhodopseudomonas sphaeroides
Sau3AI5'-GATC-3'Staphylococcus aureus 3A
EcoRI5'-GAATTC-3'Escherichia coli
BamHI5'-GGATCC-3'Bacillus amyloliquefaciens H.
HindIII5'-AAGCTT-3'Haemophilus influenzae
KpnI5'-GGTACC-3'Klebsiella pneumoniae OK8
ClaI5'-ATCGAT-3'Caryophanon latum
BssHII5'-GCGCGC-3'Bacillus stearothermophilus
NotI5'-GCGGCCGC-3'Nocardia otitidiscaviarum

Restriction Enzyme Cuts: Blunt vs. Sticky Ends

  • Blunt ends: Cuts are straight through both DNA strands at the line of symmetry.
  • Sticky ends: Cuts are displaced equally on either side of the line of symmetry.
    • Ends have either 5' overhangs or 3' overhangs.
  • Arber, Nathans, and Smith were awarded the 1978 Nobel Prize for their discovery of restriction enzymes.

Restriction Fragment Length Variation

  • Different restriction enzymes produce fragments of different lengths.
  • Average fragment length is 4n4^n, where n is the number of bases in the recognition site.
    • 4-base recognition site occurs every 444^4 bp (256 bp).
      • With a 3 billion bp genome, this yields ~12 million fragments (3×109/256=12×1063 \times 10^9 / 256 = 12 \times 10^6).
    • 6-base recognition site occurs every 464^6 bp (4096 bp or 4.1 kb).
      • With a 3 billion bp genome, this yields ~700,000 fragments (3×109/4100=7×1053 \times 10^9 / 4100 = 7 \times 10^5).

Cloning Vector

  • A plasmid or virus used to carry inserted foreign DNA and replicate it, producing more copies of the foreign DNA or its protein product.

Creating Recombinant DNA with Plasmid Vectors

  • Plasmid cloning vectors have three main features:
    • Origin of replication
    • Restriction site(s) for cloning insert DNA
    • A selectable marker (e.g., antibiotic resistance)
  • Note: Bacterial artificial chromosomes (BACs) and yeast artificial chromosomes (YACs) are alternate cloning vectors that can carry large inserts.

Cloning DNA Fragments: Basic Steps

  • Two basic steps:
    1. Insert DNA fragments into cloning vectors to make a recombinant DNA molecule.
    2. Transport recombinant DNA into a living cell to be copied.

Creating Recombinant DNA Molecules with Plasmid Vectors (cont.)

  • Digestion of the vector and human genomic DNA with a restriction enzyme results in complementary sticky ends.

Creating Recombinant DNA Molecules with Plasmid Vectors (cont.)

  • Ligase is used to seal the phosphodiester backbones between vector and insert.
  • Paul Berg was awarded the 1980 Nobel Prize for his work on gene splicing.

Molecular Cloning: Host Cells and Amplification of Recombinant DNA

  • In E. coli, only 0.1% of cells will be transformed with the plasmid.
  • Selection: Only cells with the plasmid will grow on media with ampicillin.
  • Each cell multiplies to produce millions of genetically identical cells, each with a recombinant plasmid.
  • Transformation: The process by which a cell or organism takes up foreign DNA.

Plasmid Screening for Insert-Containing DNA

  • Plasmid carries the lacZ gene, which encodes β-galactosidase (β-gal).
  • Insertional inactivation: Restriction site for cloning insert DNA is located in the middle of lacZ.
  • Screen for β-gal activity:
    • Plasmid without insert will have an intact lacZ.
    • Plasmid with insert will have a disrupted lacZ.

Molecular Cloning: Distinguishing Recombinant Molecules

  • Using a screen to distinguish cells carrying recombinant molecules from cells carrying non-recombinant molecules.
  • X-gal is a substrate for β-gal and is converted to a blue pigment.
    • Intact lacZ → blue colony
    • Disrupted lacZ → white colony
  • IPTG (isopropyl-β-D-thiogalactopyranoside) is a lactose analog that can induce lacZ gene expression.
  • X-gal = 5-bromo-4-chloro-3-indolyl-β-D-galactopyranoside

Expression Vectors

  • Allow transcription and translation of introduced genes.

Polymerase Chain Reaction (PCR)

  • Developed in 1983 by Karry Mullis.
  • Allows making billions of copies of DNA from a few copies.
  • Mullis shared the Nobel Prize in 1993.

PCR for DNA Amplification

  • PCR generates copies of target DNA.
  • It is a faster, less expensive, and more flexible way to amplify specific DNA fragments than molecular cloning.
  • Extremely efficient: can amplify DNA from a single cell or from some archaeological samples.
  • Oligonucleotides are designed from previously known DNA sequences and serve as primers for DNA synthesis.
  • The target sequence located between primer sequences is exponentially amplified by 25-30 cycles of DNA synthesis.

PCR Primers

  • Two oligonucleotide primers (16-26 nucleotides) are needed for PCR reactions.
  • The region between the two primers will be synthesized.
  • One primer is complementary to one strand of DNA at one end of the target region.
  • The other primer is complementary to the other strand of DNA at the other end of the target region.

PCR Cycle Steps

  1. Denaturation:
    • Initial: 94C94^\circ C for 5 minutes (first round)
    • Subsequent: 94C94^\circ C for 20 seconds
  2. Annealing:
    • 5060C50-60^\circ C for 2 minutes
    • Primers base pair at sites flanking target sequence of genomic DNA
  3. Extension/Polymerization:
    • 72C72^\circ C for 2-5 minutes
    • Polymerization from primers along templates

PCR Cycle Steps (Described)

  • (1) Denature strands
  • (2) Base pairing of primers
  • (3) Polymerization from primers along templates

Exponential Amplification in PCR

  • Illustrates exponential increase in target DNA: 1 → 2 → 4 → 8 → 16 → 32 copies, etc.

Uses for PCR

  • Genotype detection
  • Analysis of traces of partially degraded DNA
  • Evolutionary studies:
    • Compare homologous sequences from related organisms.
    • Compare sequences from a variety of sources.
    • Studies of gene diversity
  • Diagnosis of infectious diseases (detect bacterial and viral infection)
  • Forensics - Amplify DNA for analysis
  • Sex determination

DNA Sequence Analysis

  • Two methods originally developed in 1977:
    • Maxam-Gilbert method: Chemical cleavage of DNA at specific nucleotides.
    • Sanger method: Enzymatic extension of DNA strands to a defined terminating base.
  • Both methods can determine the sequence of 500-700 bp per reaction with 99.9% accuracy.
  • The Sanger method is more amenable to automation.
  • Paul Berg, Walter Gilbert, and Frederick Sanger shared the 1980 Nobel Prize for recombinant DNA and DNA sequencing.

Sanger Sequencing: Nested Fragments

  • Sanger sequencing generates sets of nested fragments separated by size.
  • Two steps:
    1. Generate a complete series of complementary single-stranded subfragments from a template DNA.
      • Each subfragment differs in length by a single nucleotide.
      • Identify the terminal nucleotide in each subfragment.
    2. Polyacrylamide gel electrophoresis:
      • Separates DNA molecules that differ in length by one nucleotide.

Sanger Sequencing Requirements

  • DNA polymerase requires:
    • Template: A single strand of DNA to copy.
    • Deoxyribonucleotide triphosphates (dATP, dCTP, dGTP, dTTP).
    • Primer: Short single-stranded DNA molecule that is complementary to the template.

Sanger Sequencing Template

  • A recombinant plasmid is a good template for Sanger sequencing.
  • The primer is designed to be complementary to the plasmid sequence adjacent to the unknown insert sequence.
  • Template and primer interact through hybridization.

Sanger Sequencing: Chain Termination

  • Incorporation of a dideoxynucleotide (ddNTP) terminates DNA synthesis.
  • dNTPs and ddNTPs are used during DNA synthesis.
  • ddNTPs lack a 3' -OH group, preventing further extension.

Sanger Sequencing: Fragment Generation

  • Sanger sequencing generates a series of single-stranded DNA fragments.
  • DNA fragments include the primer and nucleotides complementary to the unknown DNA.
  • The DNA fragments are a nested array—they each differ in length by one nucleotide.

Sanger Sequencing: Electrophoretic Separation

  • Nested fragments are separated by size using electrophoresis.
  • A special gel can separate DNA fragments that differ in size by only one nucleotide.
  • Smaller DNA fragments migrate quickly and appear at the bottom of the gel.

Automated DNA Sequencing

  • After electrophoresis, fragments flow through a detector, and the color of the fragment is digitally recorded.

Automated Sanger Sequencing: Detection

  • Each ddNTP is labeled with a different fluorescent dye for detection of the sequence.
  • Each lane displays the sequence obtained from a separate DNA sample and primer.
  • Each fragment has terminated with a specific ddNTP labeled with a specific fluorescent dye.

DNA Sequence Trace

  • Computer reads the sequence complementary to the template strand.
  • Sequence is read from left to right (5'-to-3' synthesis from primer).

Next-Generation Sequencing (NGS) Technologies

  • Illumina
  • Pacific Biosciences
  • Oxford Nanopore sequencing

Illumina Sequencing

  • Sequencing by synthesis (SBS).
  • Sequence billions of fragments simultaneously (massively parallel sequencing).
  • No need to clone DNA fragments into a vector.
  • Very small amounts of reagents (enzyme and nucleotides) are used.

Illumina Patterned Flow Cell

  • Advantages:
    • Higher data output
    • More sequencing reads
    • Faster run times

Human Genome Library Construction for Illumina

  1. Shear high molecular weight DNA with sonication.
  2. Enzymatic treatments to blunt ends.
  3. Phosphorylation.
  4. Addition of A-overhang.
  5. Ligation to adapters (each with a DNA barcode).
  6. PCR amplify.
  7. Quantitate library.

Illumina Sequencing Process

  1. First chemistry cycle: Add all four labeled reversible terminators, primers, and DNA polymerase enzyme to the flow cell to initiate the first sequencing cycle.
  2. After laser excitation, capture the image of emitted fluorescence from each cluster on the flow cell. Record the identity of the first base for each cluster.
  3. Second chemistry cycle: Add all four labeled reversible terminators and enzyme to the flow cell to initiate the next sequencing cycle.

Illumina Sequencing Chemistry

  • Incorporate labeled nucleotide.
  • Detect signal.
  • Deblock 3' end.
  • Cleave fluorophore.

Illumina Sequencing - Multiple Cycles

  1. After laser excitation, collect the image data as before. Record the identity of the second base for each cluster.
  2. Repeat cycles of sequencing to determine the sequence of bases in a given fragment, a single base at a time.

Ultrahigh-Throughput DNA Sequencing

  • 2008 - New generation DNA sequencing methods.
  • 1800 GB pairs of sequence can be determined in a single experiment.

Long-Read Sequencing

  • Long reads – Over 2 Mb
  • Direct RNA sequence
  • Error rate high
  • MinION 500 pores
  • PromethION – 3000 pores, up to 4 Tb in 2 days.

Bioinformatics

  • Bioinformatics provides tools for visualizing functional features of genomes.
  • Bioinformatics is the science of using computational tools to decipher biological information.
  • 1988 – National Center for Biotechnology Information (NCBI) established.
    • Oversees GenBank
    • Created additional public databases of biological information
    • Developed bioinformatic tools for analyzing, systemizing, and disseminating the data

Tools for Global Analysis of Gene Expression

  • RNA-sequencing (RNA-seq)
    • Determines whether a gene is transcribed and how much RNA is present corresponding to each gene.
    • Used for basic research to understand gene regulation and applied biomedical research to identify genes that are misregulated in tissues/individuals with diseases and in specific mutants.
    • Transcriptome: The set of all RNA molecules that are transcribed in a cell, tissue, or organ.

RNA Sequencing (RNA-seq)

  • RNA-Seq is a newer method to identify expressed genes and their level of expression.
  • It has several important applications in comparing transcriptomes - the set of all RNA molecules, including mRNAs and non-coding RNAs, that are transcribed in one cell or a population of cells.
  • RNA-Seq is used to compare transcription in:
    • Different cell types
    • Healthy vs. diseased cells
    • Different stages of development
    • Response to different environmental agents such as hormones or toxic chemicals

RNA Sequencing (RNA-seq) Process

  1. Isolate RNA from a sample of cells. May focus on a subpopulation of RNA, such as mRNAs or short non-coding RNAs.
  2. Break the RNAs into small fragments.
  3. Attach short oligonucleotide linkers to the ends of the RNAs.
  4. Synthesize cDNAs via reverse transcriptase PCR, using the RNAs as templates. The PCR primers are complementary to the linkers.
  5. Sequence the cDNAs using a next-generation sequencing technology.
  6. Using computer technology, align the cDNA sequences along the genomic sequence.

RNA-Seq Advantages

  • More accurate at quantifying the amount of each RNA transcript.
  • Superior at detecting RNA transcripts that are in low abundance.
  • Identifies the exact boundaries between exons and introns; identifies new splice variants.
  • Identifies the 5’ and 3’ ends of RNA transcripts.