Fundamentals of Genetics, Evolution, and DNA Technologies

Historical Landmarks in Genetics and Genomics

  • 18651865: Gregor Mendel's work on the laws of inheritance was published.
  • 19001900: Mendel's laws were rediscovered.
  • 19131913: The first genetic linkage map was developed.
  • 19441944: DNA was identified as the genetic material.
  • 19531953: The double helix structure of DNA was discovered by James Watson and Francis Crick, with contributions from Rosalind Franklin and Maurice Wilkins.
  • 19611961: The genetic code was deciphered.
  • 19661966: The genetic code was fully cracked by Marshal Nirenberg and others.
  • 19721972: The first recombinant DNA molecule was created by Paul Berg.
  • 19771977: Frederick Sanger developed DNA sequencing (Sanger Sequencing) techniques.
  • 19821982: The first recombinant DNA drug (insulin) was approved by the FDA.
  • 19831983: Kary Mullis developed the Polymerase Chain Reaction (PCR) technique (awarded Nobel Prize in Chemistry in 19931993).
  • 19901990: The Human Genome Project (HGP) was officially launched in the United States by the National Institutes of Health (NIH) and the Department of Energy (DOE).
  • 19951995: The first bacterial genome (Haemophilus influenzae, 1.83Mb1.83\,Mb) was sequenced and published in Science.
  • 19961996: The sequence of the first human chromosome (2222) was completed.
  • 19991999: The first human chromosome (2222) sequence was published.
  • 20002000: A draft version of the human genome sequence was announced by President Bill Clinton and Prime Minister Tony Blair.
  • 20032003: The Human Genome Project was completed, covering 99%99\% of the genome with an error rate of less than 11 in 10,00010,000 bases.
  • 20132013: The first multi-gene sequencing diagnostic test for tumor profiling was introduced in the NHS.
  • 20152015: The first patients in the 100,000100,000 Genomes Project received diagnoses.
  • 20172017: Release of genome-wide genotype data for 500,000500,000 UK Biobank participants.
  • 20202020: Routine use of Whole Genome Sequencing (WGS) began in the NHS.
  • 20222022: A truly complete ("telomere-to-telomere") human genome sequence was generated.

Understanding the Genome

  • Etymology and Definition:     * The term "Genome" was coined in 19201920 by Professor Hans Winkler, a Professor of Botany at the University of Hamburg.     * Winkler's original proposal defined the "Genom" as the haploid chromosome set which, along with the pertinent protoplasm, specifies the material foundations of the species.     * Modern definition: A genome is an organism’s complete set of DNA, including all of its genes.
  • Human Genomic Organization:     * The human genome is spread over 4646 chromosomes (2323 pairs).     * Inheritance: 2323 chromosomes are inherited from the mother (egg) and 2323 from the father (sperm).
  • Cellular Categories:     * Somatic Cells: These make up most of the body and contain two genome copies (diploid).     * Germ Cells: Egg and sperm cells. They start with two copies but end with one genome copy each after meiosis, prior to fertilization.
  • Gene Expression: The genome contains tens of thousands of genes (ca.24,000ca. 24,000). Genes are only "switched on" or expressed when needed.

DNA Structure and Chemistry

  • Composition: DNA stands for Deoxyribonucleic Acid. It is a two-stranded molecule with a double helix shape.
  • Nucleotide Components: Each nucleotide consists of:     1. A sugar molecule (deoxyribose).     2. A phosphate group.     3. A nitrogenous base.
  • Bonding:     * Glycosidic bond: Connects the nitrogenous base to the sugar molecule.     * Phosphodiester bond: Attaches the phosphate group to the sugar molecule, forming the "backbone."
  • Nitrogenous Bases: Categorized as Heterocyclic, Aromatic organic compounds.     * Purines: Two-carbon nitrogen ring bases; larger than pyrimidines. Examples include Adenine (AA) and Guanine (GG). Catabolism end product is uric acid. Primarily synthesized in the liver.     * Pyrimidines: One-carbon nitrogen ring bases. Examples include Thymine (TT) and Cytosine (CC). Catabolism end products are ammonia and carbon dioxide. Synthesized in a variety of tissues.
  • Directionality and Polarity:     * Strands have a beginning and end designated as 55' (five prime) and 33' (three prime).     * Strands run antiparallel: one runs 55' to 33' (sense strand) and the other runs 33' to 55' (antisense strand).

DNA Replication

  • Definition: The process by which a cell makes an identical copy of its genome before division.
  • Semi-Conservative Nature: Each new DNA molecule consists of one original (old) chain and one newly synthesized chain.
  • Process Steps:     1. Unwinding: The enzyme Helicase "unzips" the double helix by breaking hydrogen bonds between complementary bases (AA with TT, CC with GG).     2. Replication Fork: The separation creates a "Y" shape called a replication fork. The separated strands act as templates.     3. Primer Binding: An enzyme called Primase produces a short RNA piece called a primer, which binds to the strand to mark the starting point for synthesis.     4. Leading Strand Synthesis: Oriented 33' to 55' toward the fork. DNA Polymerase adds nucleotides continuously in the 55' to 33' direction.     5. Lagging Strand Synthesis: Oriented 55' to 33' away from the fork. Replicated discontinuously. Numerous RNA primers are bound at various points.     6. Okazaki Fragments: Chunks of DNA added to the lagging strand in the 55' to 33' direction.     7. Primer Removal: Exonuclease strips away RNA primers. The resulting gaps are filled with complementary nucleotides.     8. Proofreading: The new strand is checked for sequence mistakes.     9. Ligation: The enzyme DNA Ligase seals the DNA sequence into two continuous double strands.

Polymerase Chain Reaction (PCR)

  • Definition: A laboratory technique used to make millions to billions of copies of a specific segment of DNA.
  • Components and Primers:     * Uses short synthetic DNA fragments called primers (typically 1818 to 2525 nucleotides long).     * Primers target unique sequences to identify specific parts of the genome, such as a gene.
  • Workflow:     1. Denaturation: The reaction mixture is heated to break hydrogen bonds, separating the two DNA strands.     2. Annealing: Temperature is lowered to allow primers to bind to the selected segment.     3. Synthesis (Extension): DNA polymerase synthesizes new strands complementary to the templates.     4. Repetition: Multiple rounds (typically 2020 to 3030 cycles) amplify the segment exponentially.

Gel Electrophoresis

  • Function: Separates DNA fragments based on size and charge.
  • Mechanism:     * DNA has a negatively charged phosphate backbone due to bonds between oxygen and phosphorus atoms.     * An electric current is run through an agarose gel containing the DNA.     * DNA molecules move toward the positive electrode (++).     * Speed and Size: Smaller fragments travel through the gel pores faster than larger fragments, allowing separation into distinct bands.

Recombinant DNA (rDNA) Technology and Cloning

  • Concept: Combining DNA from different sources in a lab to manipulate or isolate segments of interest.
  • Enzymatic Tools:     * Restriction Enzymes: DNA-cutting enzymes that recognize specific target sequences. They produce either "staggered" cuts (sticky ends with single-stranded overhangs) or blunt ends.     * DNA Ligase: A DNA-joining enzyme that links matching DNA ends together.
  • DNA Cloning Process:     1. A target gene is inserted into a circular DNA vector called a plasmid.     2. Transformation: The process of introducing DNA into a cell. Bacteria are often used for storage and rapid replication.     3. Heat Shock: High temperatures are used to change the bacterial membrane, enabling the plasmid to enter.     4. Selection: Antibiotics are used to select only the bacteria that successfully carry the plasmid.     5. Expression: Bacteria may be induced to express the gene and produce proteins.

Genes and Epigenetics

  • Gene Definition: A segment of DNA carrying information for a specific protein or RNA.     * Alternative Splicing: A single gene can specify more than one protein.     * Count: Humans have approximately 24,00024,000 genes.
  • Central Dogma: DNA is transcribed into messenger RNA (mRNA), which is then translated by ribosomes into proteins.
  • Epigenetics: Changes in gene activity that do not involve alterations to the underlying DNA sequence. Influenced by diet, behavior, and environment (e.g., pollution).     * DNA Methylation: Methyl marks added to DNA bases to repress gene activity.     * Histone Modification: Molecules attach to the "tails" of histone proteins, altering how tightly DNA is wrapped and thus affecting activity.

DNA Sequencing Technologies

First Generation: Sanger Sequencing

  • Inventor: Frederick Sanger (“Father of Genomics”).
  • Methodology: Known as the Chain-Termination Method.
  • Key Requirements:     * Single-stranded DNA template.     * Primer (short oligonucleotide) to provide a free 33' hydroxyl (OHOH) group.     * DNA Polymerase.     * Normal deoxynucleotides (dATPdATP, dTTPdTTP, dGTPdGTP, dCTPdCTP).     * Dideoxynucleotides (ddNTPsddNTPs): Chain-terminating sugars (2,32',3' dideoxyribose) that lack the 33' OHOH group required for further addition.
  • Mechanism: A small amount of one ddNTPddNTP (e.g., ddATPddATP) is added. Termination occurs at different positions across different strands, creating a "family" of fragments of varying lengths.
  • Automated Sequencing: Modern versions use fluorescently labeled ddNTPsddNTPs (each base a different color). These are separated by capillary electrophoresis and detected by a laser, with data fed directly to a computer.

Second Generation: Next-Generation Sequencing (NGS/Illumina)

  • Terminology: Whole-genome massively parallel sequencing.
  • Mechanism: Based on "Sequencing by Synthesis" (SBSSBS).
  • Workflow:     1. Fragmentation: DNA is broken into small pieces (75400bp75-400\,bp).     2. Library Prep: Adapters are ligated to fragments.     3. Flow Cell Attachment: Fragments attach to a high-density flow cell.     4. Clonal Amplification: Bridge amplification creates millions of clusters (each cluster is a clone of a fragment).     5. Imaging: Fluorescently labeled nucleotides are incorporated; a high-resolution camera captures images of the flow cell to identify the base color.
  • Output: Generates hundreds of millions to billions of short sequencing reads (typically 7515075-150 bases).

Third Generation: Long-Read Sequencing

  • Characteristics: Sequences single molecules without the need for PCR amplification.
  • Platforms:     * Pacific Biosciences (PacBio): Uses Zero-Mode Waveguides (ZMWZMW). Produces extremely long reads (14,00014,000 to 40,000bp40,000\,bp). Can detect base modifications/epigenetics.     * Oxford Nanopore Technologies: Detects changes in electrical signal/current as a single DNA or RNA molecule passes through a nanopore. Capable of sequencing 10,00010,000 bases per second in real-time.

Genomic Projects and Precision Medicine

  • The Human Genome Project (HGP):     * Collaborative international consortium (US, UK, France, Germany, Japan, China).     * Duration: 199020031990\text{--}2003 (1313 years).     * Cost: $3billion\$3\,billion (2.7billion2.7\,billion reported in some contexts).     * Methods: Used Hierarchical Shotgun Sequencing.
  • 100,000 Genomes Project (Genomics England):     * Participants: NHS patients with rare diseases (and their families) and cancer patients.     * Goal: Aimed to transform NHS care by offering diagnoses and creating a genomic medicine service.
  • Genome UK Strategy (20202020): Three core pillars:     1. Diagnosis and Personalized Medicine: Incorporating genomics into routine care.     2. Prevention: Using genomics for predictive and preventative care.     3. Research: Supporting fundamental and translational research.
  • Variation: A single base difference is known as a Single Nucleotide Polymorphism (SNP), also called a Single Nucleotide Variant or Single Base Mutation.
  • Personalized Medicine: Technology now allows a human genome to be sequenced in a few days for less than £1,000\pounds1,000. The primary remaining challenge is data analysis.