Fundamentals of Genetics, Evolution, and DNA Technologies
Historical Landmarks in Genetics and Genomics
- 1865: Gregor Mendel's work on the laws of inheritance was published.
- 1900: Mendel's laws were rediscovered.
- 1913: The first genetic linkage map was developed.
- 1944: DNA was identified as the genetic material.
- 1953: The double helix structure of DNA was discovered by James Watson and Francis Crick, with contributions from Rosalind Franklin and Maurice Wilkins.
- 1961: The genetic code was deciphered.
- 1966: The genetic code was fully cracked by Marshal Nirenberg and others.
- 1972: The first recombinant DNA molecule was created by Paul Berg.
- 1977: Frederick Sanger developed DNA sequencing (Sanger Sequencing) techniques.
- 1982: The first recombinant DNA drug (insulin) was approved by the FDA.
- 1983: Kary Mullis developed the Polymerase Chain Reaction (PCR) technique (awarded Nobel Prize in Chemistry in 1993).
- 1990: The Human Genome Project (HGP) was officially launched in the United States by the National Institutes of Health (NIH) and the Department of Energy (DOE).
- 1995: The first bacterial genome (Haemophilus influenzae, 1.83Mb) was sequenced and published in Science.
- 1996: The sequence of the first human chromosome (22) was completed.
- 1999: The first human chromosome (22) sequence was published.
- 2000: A draft version of the human genome sequence was announced by President Bill Clinton and Prime Minister Tony Blair.
- 2003: The Human Genome Project was completed, covering 99% of the genome with an error rate of less than 1 in 10,000 bases.
- 2013: The first multi-gene sequencing diagnostic test for tumor profiling was introduced in the NHS.
- 2015: The first patients in the 100,000 Genomes Project received diagnoses.
- 2017: Release of genome-wide genotype data for 500,000 UK Biobank participants.
- 2020: Routine use of Whole Genome Sequencing (WGS) began in the NHS.
- 2022: A truly complete ("telomere-to-telomere") human genome sequence was generated.
Understanding the Genome
- Etymology and Definition:
* The term "Genome" was coined in 1920 by Professor Hans Winkler, a Professor of Botany at the University of Hamburg.
* Winkler's original proposal defined the "Genom" as the haploid chromosome set which, along with the pertinent protoplasm, specifies the material foundations of the species.
* Modern definition: A genome is an organism’s complete set of DNA, including all of its genes.
- Human Genomic Organization:
* The human genome is spread over 46 chromosomes (23 pairs).
* Inheritance: 23 chromosomes are inherited from the mother (egg) and 23 from the father (sperm).
- Cellular Categories:
* Somatic Cells: These make up most of the body and contain two genome copies (diploid).
* Germ Cells: Egg and sperm cells. They start with two copies but end with one genome copy each after meiosis, prior to fertilization.
- Gene Expression: The genome contains tens of thousands of genes (ca.24,000). Genes are only "switched on" or expressed when needed.
DNA Structure and Chemistry
- Composition: DNA stands for Deoxyribonucleic Acid. It is a two-stranded molecule with a double helix shape.
- Nucleotide Components: Each nucleotide consists of:
1. A sugar molecule (deoxyribose).
2. A phosphate group.
3. A nitrogenous base.
- Bonding:
* Glycosidic bond: Connects the nitrogenous base to the sugar molecule.
* Phosphodiester bond: Attaches the phosphate group to the sugar molecule, forming the "backbone."
- Nitrogenous Bases: Categorized as Heterocyclic, Aromatic organic compounds.
* Purines: Two-carbon nitrogen ring bases; larger than pyrimidines. Examples include Adenine (A) and Guanine (G). Catabolism end product is uric acid. Primarily synthesized in the liver.
* Pyrimidines: One-carbon nitrogen ring bases. Examples include Thymine (T) and Cytosine (C). Catabolism end products are ammonia and carbon dioxide. Synthesized in a variety of tissues.
- Directionality and Polarity:
* Strands have a beginning and end designated as 5′ (five prime) and 3′ (three prime).
* Strands run antiparallel: one runs 5′ to 3′ (sense strand) and the other runs 3′ to 5′ (antisense strand).
DNA Replication
- Definition: The process by which a cell makes an identical copy of its genome before division.
- Semi-Conservative Nature: Each new DNA molecule consists of one original (old) chain and one newly synthesized chain.
- Process Steps:
1. Unwinding: The enzyme Helicase "unzips" the double helix by breaking hydrogen bonds between complementary bases (A with T, C with G).
2. Replication Fork: The separation creates a "Y" shape called a replication fork. The separated strands act as templates.
3. Primer Binding: An enzyme called Primase produces a short RNA piece called a primer, which binds to the strand to mark the starting point for synthesis.
4. Leading Strand Synthesis: Oriented 3′ to 5′ toward the fork. DNA Polymerase adds nucleotides continuously in the 5′ to 3′ direction.
5. Lagging Strand Synthesis: Oriented 5′ to 3′ away from the fork. Replicated discontinuously. Numerous RNA primers are bound at various points.
6. Okazaki Fragments: Chunks of DNA added to the lagging strand in the 5′ to 3′ direction.
7. Primer Removal: Exonuclease strips away RNA primers. The resulting gaps are filled with complementary nucleotides.
8. Proofreading: The new strand is checked for sequence mistakes.
9. Ligation: The enzyme DNA Ligase seals the DNA sequence into two continuous double strands.
Polymerase Chain Reaction (PCR)
- Definition: A laboratory technique used to make millions to billions of copies of a specific segment of DNA.
- Components and Primers:
* Uses short synthetic DNA fragments called primers (typically 18 to 25 nucleotides long).
* Primers target unique sequences to identify specific parts of the genome, such as a gene.
- Workflow:
1. Denaturation: The reaction mixture is heated to break hydrogen bonds, separating the two DNA strands.
2. Annealing: Temperature is lowered to allow primers to bind to the selected segment.
3. Synthesis (Extension): DNA polymerase synthesizes new strands complementary to the templates.
4. Repetition: Multiple rounds (typically 20 to 30 cycles) amplify the segment exponentially.
Gel Electrophoresis
- Function: Separates DNA fragments based on size and charge.
- Mechanism:
* DNA has a negatively charged phosphate backbone due to bonds between oxygen and phosphorus atoms.
* An electric current is run through an agarose gel containing the DNA.
* DNA molecules move toward the positive electrode (+).
* Speed and Size: Smaller fragments travel through the gel pores faster than larger fragments, allowing separation into distinct bands.
Recombinant DNA (rDNA) Technology and Cloning
- Concept: Combining DNA from different sources in a lab to manipulate or isolate segments of interest.
- Enzymatic Tools:
* Restriction Enzymes: DNA-cutting enzymes that recognize specific target sequences. They produce either "staggered" cuts (sticky ends with single-stranded overhangs) or blunt ends.
* DNA Ligase: A DNA-joining enzyme that links matching DNA ends together.
- DNA Cloning Process:
1. A target gene is inserted into a circular DNA vector called a plasmid.
2. Transformation: The process of introducing DNA into a cell. Bacteria are often used for storage and rapid replication.
3. Heat Shock: High temperatures are used to change the bacterial membrane, enabling the plasmid to enter.
4. Selection: Antibiotics are used to select only the bacteria that successfully carry the plasmid.
5. Expression: Bacteria may be induced to express the gene and produce proteins.
Genes and Epigenetics
- Gene Definition: A segment of DNA carrying information for a specific protein or RNA.
* Alternative Splicing: A single gene can specify more than one protein.
* Count: Humans have approximately 24,000 genes.
- Central Dogma: DNA is transcribed into messenger RNA (mRNA), which is then translated by ribosomes into proteins.
- Epigenetics: Changes in gene activity that do not involve alterations to the underlying DNA sequence. Influenced by diet, behavior, and environment (e.g., pollution).
* DNA Methylation: Methyl marks added to DNA bases to repress gene activity.
* Histone Modification: Molecules attach to the "tails" of histone proteins, altering how tightly DNA is wrapped and thus affecting activity.
DNA Sequencing Technologies
First Generation: Sanger Sequencing
- Inventor: Frederick Sanger (“Father of Genomics”).
- Methodology: Known as the Chain-Termination Method.
- Key Requirements:
* Single-stranded DNA template.
* Primer (short oligonucleotide) to provide a free 3′ hydroxyl (OH) group.
* DNA Polymerase.
* Normal deoxynucleotides (dATP, dTTP, dGTP, dCTP).
* Dideoxynucleotides (ddNTPs): Chain-terminating sugars (2′,3′ dideoxyribose) that lack the 3′ OH group required for further addition.
- Mechanism: A small amount of one ddNTP (e.g., ddATP) is added. Termination occurs at different positions across different strands, creating a "family" of fragments of varying lengths.
- Automated Sequencing: Modern versions use fluorescently labeled ddNTPs (each base a different color). These are separated by capillary electrophoresis and detected by a laser, with data fed directly to a computer.
Second Generation: Next-Generation Sequencing (NGS/Illumina)
- Terminology: Whole-genome massively parallel sequencing.
- Mechanism: Based on "Sequencing by Synthesis" (SBS).
- Workflow:
1. Fragmentation: DNA is broken into small pieces (75−400bp).
2. Library Prep: Adapters are ligated to fragments.
3. Flow Cell Attachment: Fragments attach to a high-density flow cell.
4. Clonal Amplification: Bridge amplification creates millions of clusters (each cluster is a clone of a fragment).
5. Imaging: Fluorescently labeled nucleotides are incorporated; a high-resolution camera captures images of the flow cell to identify the base color.
- Output: Generates hundreds of millions to billions of short sequencing reads (typically 75−150 bases).
Third Generation: Long-Read Sequencing
- Characteristics: Sequences single molecules without the need for PCR amplification.
- Platforms:
* Pacific Biosciences (PacBio): Uses Zero-Mode Waveguides (ZMW). Produces extremely long reads (14,000 to 40,000bp). Can detect base modifications/epigenetics.
* Oxford Nanopore Technologies: Detects changes in electrical signal/current as a single DNA or RNA molecule passes through a nanopore. Capable of sequencing 10,000 bases per second in real-time.
Genomic Projects and Precision Medicine
- The Human Genome Project (HGP):
* Collaborative international consortium (US, UK, France, Germany, Japan, China).
* Duration: 1990–2003 (13 years).
* Cost: $3billion (2.7billion reported in some contexts).
* Methods: Used Hierarchical Shotgun Sequencing.
- 100,000 Genomes Project (Genomics England):
* Participants: NHS patients with rare diseases (and their families) and cancer patients.
* Goal: Aimed to transform NHS care by offering diagnoses and creating a genomic medicine service.
- Genome UK Strategy (2020): Three core pillars:
1. Diagnosis and Personalized Medicine: Incorporating genomics into routine care.
2. Prevention: Using genomics for predictive and preventative care.
3. Research: Supporting fundamental and translational research.
- Variation: A single base difference is known as a Single Nucleotide Polymorphism (SNP), also called a Single Nucleotide Variant or Single Base Mutation.
- Personalized Medicine: Technology now allows a human genome to be sequenced in a few days for less than £1,000. The primary remaining challenge is data analysis.