DNA Synthesis and PCR Techniques Vocabulary

DNA Synthesis and Analysis

Requirements for DNA Synthesis

  • Template: A DNA molecule to start with.

  • Enzyme: DNA polymerase.

  • Primer: Provides a free 3' hydroxyl group for the polymerase to bind and initiate synthesis.

  • Nucleotides: Building blocks for synthesizing the new DNA strand.

  • Buffer solution: Provides a suitable chemical environment for the reaction.

  • Precise temperature control: Important for optimal enzyme activity.

Polymerase Chain Reaction (PCR)

  • A cyclic process that mimics DNA replication to produce millions of copies of a DNA sequence.

  • Developed by Kary Mullis in 1983; he won the Nobel Prize in 1993.

  • Refined by the discovery of Taq polymerase, a heat-stable enzyme from Thermus aquaticus.

  • Mullis was a controversial figure, questioning the link between HIV and AIDS and expressing skepticism about human-caused climate change.

PCR Steps
  1. Denaturation: High temperature separates the DNA double strand. Analogous to helicase in natural replication.

  2. Annealing: Primers bind to the template strands at a lower temperature (forward and reverse primers).

  3. Extension/Elongation: DNA polymerase synthesizes new strands at an increased temperature.

Amplification
  • The amount of DNA doubles with each cycle, leading to exponential amplification.

    • Cycle 1: 2 copies

    • Cycle 2: 4 copies

    • Cycle 3: 8 copies

    • Cycle 4: 16 copies

Variations of PCR

Reverse Transcription PCR (RT-PCR)
  • Template is cDNA (complementary DNA), which is reverse transcribed from RNA (typically mRNA).

  • Important for gene expression studies and cloning genes.

  • cDNA contains only the exons of a gene.

RT-qPCR (Real-Time Quantitative PCR)
  • Measures the number of copies synthesized in real-time.

  • Template is labeled with DNA-binding dyes or probes to measure fluorescence.

    • The probe contains a quencher and a fluorophore.

  • As the polymerase synthesizes the new strand, the fluorophore is released, and a fluorescence signal is detected.

  • The signal is quantified as an RFU (relative fluorescence unit) value.

  • The more copies of the target sequence, the stronger the signal.

  • A dilution series is used to create a standard curve. The earlier the RFU curve appears, the higher the concentration of the target.

  • The PCR efficiency decreases as the reaction runs due to depletion of nucleotides and enzyme fatigue.

Applications of PCR

  • Diagnosis of pathogens (bacteria, viruses, fungi).

  • DNA fingerprinting for individual identification.

Primer Design and Specificity
  • Primer specificity is crucial for polymerase binding.

  • Allows differentiation between alleles (variations of a gene).

  • Sensitivity to primer design and contamination are important considerations.

Primer Length and Combinations
  • Shorter Primer (e.g., 3 bases):
    4^3 = 64 possible combinations are found throughout a genome, likelihood to randomly encounter its complement is high.

  • Longer Primer (e.g., 25 bases):
    More possible combinations, likelihood of random binding decreases.

  • Longer primers generally increase specificity.

Applications
  • Pathogen Diagnosis: Identifying specific DNA or RNA sequences of pathogens. Important to note that RNA requires extra caution due to sensitivity to degradation.

  • DNA Fingerprinting: Identifying individuals based on unique DNA sequence patterns, especially short tandem repeats (STRs).

    • Used in forensic science, paternity testing, and personal identification.

    • STRs are highly variable due to their repetitive nature and location in non-coding regions, leading to a high mutation rate and individual uniqueness.

    • DNA patterns can be visualized on agarose gels as fragments of varying lengths.

DNA Fingerprinting and STRs
  • Each individual has a unique combination of STRs.

  • The Y chromosome can be used to distinguish between males and females.

Case Study: The Green River Killer
  • Gary Ridgway was identified using STR analysis of semen fluid collected from victims in the 1980s.

  • The DNA matched a saliva swab from Ridgway.

DNA Databases
  • CODIS (Combined DNA Index System) is a national DNA database in the U.S. used by law enforcement.

  • It stores DNA profiles from convicted offenders and crime scene evidence.

  • The system relies on STR markers.

  • Initially used 13 STRs, later updated to 20 loci for improved accuracy.

Paternity Testing
  • Based on multiple STRs.

  • Individuals inherit one allele from each parent at each STR locus.

  • A paternity index is calculated based on the probability of the alleged father sharing genetic markers with the child, compared to a random, unrelated individual.

  • A combined paternity index (CPI) across multiple loci is used to calculate the probability of paternity.

  • A probability of 99.99% is typically required for legal paternity confirmation.

  • Uses the equation: Probability \, of \, Paternity = \frac{CPI}{CPI + 1}

Controversial Incident: HIV Transmission by a Dentist
  • From 1990-1992, six patients were found to be HIV positive without typical risk factors.

  • The only linkage was the dentist, who was diagnosed with AIDS in 1986 but continued practicing.

  • The CDC investigated the case using HIV genotyping.

  • HIV genetic sequences from the patients and the dentist were isolated, amplified, sequenced, and compared. Local controls were also analyzed.

Phylogenetic Tree
  • The sequences are arranged into clusters based on similarities.

  • Sequences of HIV isolates from the patients clustered with the dentist's sequence, suggesting they were infected by the dentist.

  • The exact mechanism of transmission could not be determined (blood-contaminated instruments or needle-stick injuries).

  • It remains an exceptionally rare event in medical history.

DNA Sequencing

  • Determining the exact order of nucleotides in a DNA molecule.

  • Fundamental technique in molecular biology, genetics, and biotechnology.

  • Used for studying genomes, identifying mutations, and diagnosing diseases.

Sanger Sequencing
  • Developed by Frederick Sanger in 1977, also known as chain termination sequencing.

  • Mimics DNA replication but uses dideoxynucleotide triphosphates (ddNTPs).

  • ddNTPs lack a 3' hydroxyl group, terminating the DNA chain.

Ingredients for Sanger Sequencing
  • dNTPs (regular nucleotides)

  • DNA polymerase

  • Primer

  • ddNTPs (labeled with fluorescent dyes)

Process
  1. ddNTPs are incorporated into the growing DNA chain, terminating it

  2. This generates a collection of DNA fragments ending at every possible position where a ddNTP could have been added.

  3. The fragments are separated by capillary electrophoresis.

  4. A laser detects fluorescent dyes corresponding to the base terminating the fragment.

Key Difference: dNTPs vs. ddNTPs
  • dNTPs have a free 3' hydroxyl group, allowing phosphodiester linkages to continue the chain.

  • ddNTPs lack the 3' hydroxyl group, preventing further extension of the DNA strand and terminating the chain.

Characteristics of Sanger Sequencing
  • High quality (accurate base calling).

  • Low output (sequences a single template in each reaction).

  • Time-consuming.

  • Sequence length ranges from 800 to 1,200 nucleotides.

  • Suitable for sequencing single genes but not whole genomes.

Limitations
  • Cannot be used for whole genomes due to the time-consuming process.

Next-Generation Sequencing (NGS)
  • Introduced around 2005.

  • Allows millions of sequencing reactions to be performed simultaneously (massive parallel sequencing).

  • Reads are much shorter than Sanger sequencing reads.

  • No need for constructing genomic libraries by conventional cloning.

  • Sequencing occurs in real time and continuously.

Read Length
  • A key parameter in sequencing.

  • Generally, the longer the read, the lower the quality.

Third-Generation Sequencing
  • Can sequence DNA templates that are kilobases in length.

  • However, the quality is not as good as for short reads.

Genome Complexity
  • Wheat has a more complex genome than the human genome.

  • The human genome has approximately 3.2 billion base pairs.

  • The wheat genome has approximately 17 billion base pairs.

  • Human genome: 46 chromosomes (23 pairs), diploid (two sets of chromosomes).

  • Wheat genome: 42 chromosomes (six sets of seven chromosomes), hexapolyploid (more than two complete sets of chromosomes).

Genome Characteristics
  • Estimated gene number in the human genome: 20,000 - 25,000

  • Estimated gene number in the wheat genome: 80,000 - 100,000

  • Amount of repetitive DNA in the human genome: 50%

  • Amount of repetitive DNA in the wheat genome: 80%

  • Polyploidy in wheat introduces more homologous regions and redundancy, creating a more complex genomic architecture.

Human Genome Project
  • Launched in the early 1990s.

  • A landmark research project involving international collaboration.

  • The first draft was presented in February 2001.

  • Aimed to unravel all human genes, determine their location on the chromosomes, and understand genetic variation.

Genome Sequencing
  • Advances have significantly decreased the cost over the years.

  • Although having the ability to sequence the DNA of organisms, the full genome is difficult to obtain

Challenges
  • Sequence reads are short in relation to a complete genome.

  • Millions and billions of sequencing reads need to be oriented and located within the genome.

  • Repetitive sequences make it difficult to determine the correct position.

  • Putting the pieces together in the right order without knowing how the genome is supposed to be.

Assembly Process
  • The process of putting DNA sequences together in the correct order.

  • Repetitive or homologous regions in the genome complicate this process, particularly with short reads.