DNA Synthesis and PCR Techniques Vocabulary

DNA Synthesis and Analysis

Requirements for DNA Synthesis

Template: A DNA molecule to start with.
Enzyme: DNA polymerase.
Primer: Provides a free 3' hydroxyl group for the polymerase to bind and initiate synthesis.
Nucleotides: Building blocks for synthesizing the new DNA strand.
Buffer solution: Provides a suitable chemical environment for the reaction.
Precise temperature control: Important for optimal enzyme activity.

Polymerase Chain Reaction (PCR)

A cyclic process that mimics DNA replication to produce millions of copies of a DNA sequence.
Developed by Kary Mullis in 1983; he won the Nobel Prize in 1993.
Refined by the discovery of Taq polymerase, a heat-stable enzyme from Thermus aquaticus.
Mullis was a controversial figure, questioning the link between HIV and AIDS and expressing skepticism about human-caused climate change.

PCR Steps

Denaturation: High temperature separates the DNA double strand. Analogous to helicase in natural replication.
Annealing: Primers bind to the template strands at a lower temperature (forward and reverse primers).
Extension/Elongation: DNA polymerase synthesizes new strands at an increased temperature.

Amplification

The amount of DNA doubles with each cycle, leading to exponential amplification.
- Cycle 1: 2 copies
- Cycle 2: 4 copies
- Cycle 3: 8 copies
- Cycle 4: 16 copies

Variations of PCR

Reverse Transcription PCR (RT-PCR)

Template is cDNA (complementary DNA), which is reverse transcribed from RNA (typically mRNA).
Important for gene expression studies and cloning genes.
cDNA contains only the exons of a gene.

RT-qPCR (Real-Time Quantitative PCR)

Measures the number of copies synthesized in real-time.
Template is labeled with DNA-binding dyes or probes to measure fluorescence.
- The probe contains a quencher and a fluorophore.
As the polymerase synthesizes the new strand, the fluorophore is released, and a fluorescence signal is detected.
The signal is quantified as an RFU (relative fluorescence unit) value.
The more copies of the target sequence, the stronger the signal.
A dilution series is used to create a standard curve. The earlier the RFU curve appears, the higher the concentration of the target.
The PCR efficiency decreases as the reaction runs due to depletion of nucleotides and enzyme fatigue.

Applications of PCR

Diagnosis of pathogens (bacteria, viruses, fungi).
DNA fingerprinting for individual identification.

Primer Design and Specificity

Primer specificity is crucial for polymerase binding.
Allows differentiation between alleles (variations of a gene).
Sensitivity to primer design and contamination are important considerations.

Primer Length and Combinations

Shorter Primer (e.g., 3 bases):
$4^3 = 64$ possible combinations are found throughout a genome, likelihood to randomly encounter its complement is high.
Longer Primer (e.g., 25 bases):
More possible combinations, likelihood of random binding decreases.
Longer primers generally increase specificity.

Applications

Pathogen Diagnosis: Identifying specific DNA or RNA sequences of pathogens. Important to note that RNA requires extra caution due to sensitivity to degradation.
DNA Fingerprinting: Identifying individuals based on unique DNA sequence patterns, especially short tandem repeats (STRs).
- Used in forensic science, paternity testing, and personal identification.
- STRs are highly variable due to their repetitive nature and location in non-coding regions, leading to a high mutation rate and individual uniqueness.
- DNA patterns can be visualized on agarose gels as fragments of varying lengths.

DNA Fingerprinting and STRs

Each individual has a unique combination of STRs.
The Y chromosome can be used to distinguish between males and females.

Case Study: The Green River Killer

Gary Ridgway was identified using STR analysis of semen fluid collected from victims in the 1980s.
The DNA matched a saliva swab from Ridgway.

DNA Databases

CODIS (Combined DNA Index System) is a national DNA database in the U.S. used by law enforcement.
It stores DNA profiles from convicted offenders and crime scene evidence.
The system relies on STR markers.
Initially used 13 STRs, later updated to 20 loci for improved accuracy.

Paternity Testing

Based on multiple STRs.
Individuals inherit one allele from each parent at each STR locus.
A paternity index is calculated based on the probability of the alleged father sharing genetic markers with the child, compared to a random, unrelated individual.
A combined paternity index (CPI) across multiple loci is used to calculate the probability of paternity.
A probability of 99.99% is typically required for legal paternity confirmation.
Uses the equation: $Probability \, of \, Paternity = \frac{CPI}{CPI + 1}$

Controversial Incident: HIV Transmission by a Dentist

From 1990-1992, six patients were found to be HIV positive without typical risk factors.
The only linkage was the dentist, who was diagnosed with AIDS in 1986 but continued practicing.
The CDC investigated the case using HIV genotyping.
HIV genetic sequences from the patients and the dentist were isolated, amplified, sequenced, and compared. Local controls were also analyzed.

Phylogenetic Tree

The sequences are arranged into clusters based on similarities.
Sequences of HIV isolates from the patients clustered with the dentist's sequence, suggesting they were infected by the dentist.
The exact mechanism of transmission could not be determined (blood-contaminated instruments or needle-stick injuries).
It remains an exceptionally rare event in medical history.

DNA Sequencing

Determining the exact order of nucleotides in a DNA molecule.
Fundamental technique in molecular biology, genetics, and biotechnology.
Used for studying genomes, identifying mutations, and diagnosing diseases.

Sanger Sequencing

Developed by Frederick Sanger in 1977, also known as chain termination sequencing.
Mimics DNA replication but uses dideoxynucleotide triphosphates (ddNTPs).
ddNTPs lack a 3' hydroxyl group, terminating the DNA chain.

Ingredients for Sanger Sequencing

dNTPs (regular nucleotides)
DNA polymerase
Primer
ddNTPs (labeled with fluorescent dyes)

Process

ddNTPs are incorporated into the growing DNA chain, terminating it
This generates a collection of DNA fragments ending at every possible position where a ddNTP could have been added.
The fragments are separated by capillary electrophoresis.
A laser detects fluorescent dyes corresponding to the base terminating the fragment.

Key Difference: dNTPs vs. ddNTPs

dNTPs have a free 3' hydroxyl group, allowing phosphodiester linkages to continue the chain.
ddNTPs lack the 3' hydroxyl group, preventing further extension of the DNA strand and terminating the chain.

Characteristics of Sanger Sequencing

High quality (accurate base calling).
Low output (sequences a single template in each reaction).
Time-consuming.
Sequence length ranges from 800 to 1,200 nucleotides.
Suitable for sequencing single genes but not whole genomes.

Limitations

Cannot be used for whole genomes due to the time-consuming process.

Next-Generation Sequencing (NGS)

Introduced around 2005.
Allows millions of sequencing reactions to be performed simultaneously (massive parallel sequencing).
Reads are much shorter than Sanger sequencing reads.
No need for constructing genomic libraries by conventional cloning.
Sequencing occurs in real time and continuously.

Read Length

A key parameter in sequencing.
Generally, the longer the read, the lower the quality.

Third-Generation Sequencing

Can sequence DNA templates that are kilobases in length.
However, the quality is not as good as for short reads.

Genome Complexity

Wheat has a more complex genome than the human genome.
The human genome has approximately 3.2 billion base pairs.
The wheat genome has approximately 17 billion base pairs.
Human genome: 46 chromosomes (23 pairs), diploid (two sets of chromosomes).
Wheat genome: 42 chromosomes (six sets of seven chromosomes), hexapolyploid (more than two complete sets of chromosomes).

Genome Characteristics

Estimated gene number in the human genome: 20,000 - 25,000
Estimated gene number in the wheat genome: 80,000 - 100,000
Amount of repetitive DNA in the human genome: 50%
Amount of repetitive DNA in the wheat genome: 80%
Polyploidy in wheat introduces more homologous regions and redundancy, creating a more complex genomic architecture.

Human Genome Project

Launched in the early 1990s.
A landmark research project involving international collaboration.
The first draft was presented in February 2001.
Aimed to unravel all human genes, determine their location on the chromosomes, and understand genetic variation.

Genome Sequencing

Advances have significantly decreased the cost over the years.
Although having the ability to sequence the DNA of organisms, the full genome is difficult to obtain

Challenges

Sequence reads are short in relation to a complete genome.
Millions and billions of sequencing reads need to be oriented and located within the genome.
Repetitive sequences make it difficult to determine the correct position.
Putting the pieces together in the right order without knowing how the genome is supposed to be.

Assembly Process

The process of putting DNA sequences together in the correct order.
Repetitive or homologous regions in the genome complicate this process, particularly with short reads.