Genomes and Genome Evolution Study Notes Evolution

The Genomics Revolution and Economic Evolution

  • The genomics revolution began with the Human Genome Project (HGP), which provided the initial framework for understanding genome structure.
  • The cost of sequencing the the first human genome is estimated between $100,000,000 and $300,000,000, depending on the source.
  • When accounting for associated projects, technology development, and annotation, the total cost for the first human genome sequence reached approximately $2,700,000,000.
  • Modern technological advancements have reduced the cost of sequencing a single human genome to roughly $1,000.
  • The drastic reduction in cost enables personalized medicine, such as identifying specific mutations or epigenetic alterations in cancer cells to tailor individual treatments.

Foundational Definitions and Genome Metrics

  • Genome Definition: The complete set of genetic information in an organism, encompassing all DNA sequences, including both coding (genes) and non-coding sequences.
  • Measurement: Genome size is measured as the haploid number of base pairs (the amount of DNA in a single set of chromosomes).
  • Human Genome Scale: The human genome contains approximately $3,200,000,000$ base pairs.
  • Haploid vs. Diploid Inheritance: While genome size is measured by the haploid set (one parental copy), sexually reproducing organisms inherit variants from both parents, meaning the $3,200,000,000$ value does not fully capture total inheritance variation.
  • Gene Definition: The entire nucleic acid sequence required for the synthesis of a functional polypeptide (protein) or RNA molecule.
  • RNA Genes: Not all genes encode proteins; some encode functional RNA molecules, including:
    • Transfer RNA (tRNA)
    • Ribosomal RNA (rRNA)
    • microRNAs
    • Long non-coding RNAs
    • Short interfering RNAs (siRNA)

Structural Differences: Prokaryotes vs. Eukaryotes

  • Prokaryotes (Bacteria and Archaea):
    • DNA is not contained within a nucleus; it is free-floating in the cell.
    • Genomes are typically small and consist of a single circular DNA molecule.
    • Minimal packaging; while protein interactions exist, they lack the extensive histone-based organization of eukaryotes.
    • Genes are densely packed with very little non-coding "junk" DNA.
    • High packing density: Approximately 1,000 genes per million base pairs (average 1 gene per $1,000$ base pairs).
  • Eukaryotes:
    • DNA is sequestered in a membrane-bound nucleus.
    • DNA is organized into multiple linear molecules (chromosomes).
    • DNA is tightly packaged by wrapping around histone proteins.
    • Genomes contain large amounts of non-coding sequences (introns and intergenic regions).
  • Organelle Genomes: Mitochondria and chloroplasts (in plants) contain their own small, circular genomes that behave similarly to prokaryotic DNA.

The Nature of "Junk DNA" and Molecular Fossils

  • The Human Genome Project revealed that the genome is not a "refined intelligent design" but is filled with seemingly non-functional elements known as "junk DNA."
  • Repetitive Elements: Simple nucleotide sequences repeated thousands of times (e.g., ATATAT...ATATAT...).
  • Transposable Elements ("Jumping Genes"): DNA sequences that can move or copy themselves to different positions within the genome.
    • Alu Elements: A specific retrotransposon roughly $300$ nucleotides long. Approximately $1,000,000$ copies exist, making up 11% of the human genome.
    • Most transposable elements in humans are inactive due to mutation, DNA methylation, or being packaged into heterochromatin.
  • Molecular Fossils (Pseudogenes):
    • These are the remnants of defunct genes that were functional in ancestors but have been lost over time.
    • Humans have approximately $20,000$ functional genes and another $20,000$ defunct molecular fossils.
  • Evolutionary Perspective: David Penny noted that the $E. coli$ genome appears "tidy," whereas the human genome appears poorly designed by comparison. However, "junk" DNA is vital for driving evolutionary change, primarily through recombination events in euchromatin.
  • Selection Pressure: In multicellular organisms, extra DNA is tolerated because it does not impose a critical metabolic burden. In contrast, small organisms like $E. coli$ face strong selection for small genomes to allow rapid replication ($< 1$ hour).

Case Study: The Loss of Chitinase Genes in Humans

  • Chitin is the primary component of insect exoskeletons; chitinases are enzymes that break it down.
  • Ancestral mammals, particularly insect-eating monkeys, possess functional chitinase genes.
  • Humans have remnants of five ancestral chitinase genes, but four are effectively inactivated by mutation or extremely low expression.
  • One remaining copy is expressed in the gut at lower levels than in other mammals, rendering chitin an insoluble fiber for humans.
  • Genetic Inactivation Example: In the Chia2Chia2 gene, a single nucleotide change from $C$ to $T$ converts a $CGA$ codon (Arginine) into a $TGA$ stop codon, causing a premature termination of the protein.

Prokayrotic Genome Organization and Evolution

  • Operons: Prokaryotic genes are organized into polycistronic units, where a single regulatory element controls multiple genes. An example is the Tryptophan (Trp) operon (5 genes, 1 transcript, 5 proteins).
  • Genome Size Correlation: In prokaryotes, genome size correlates directly with complexity and gene number.
    • Symbiotic Bacteria: BradyrhizobiumjaponicumBradyrhizobium\,japonicum has a larger genome with more genes to facilitate nitrogen fixation in legumes; it has a slower replication rate.
    • Parasitic Bacteria: MycoplasmagenitaliumMycoplasma\,genitalium (sexually transmitted) has a tiny genome because it relies on human host metabolism for survival.
  • Horizontal Gene Transfer (HGT): The primary driver of rapid prokaryotic evolution.
    • Conjugation: Transfer of plasmids between bacteria (often carrying antibiotic resistance).
    • Transduction: Bacteriophages (viruses) accidentally package bacterial DNA and transfer it to new hosts.
    • Transformation: Direct uptake of DNA from the environment.
  • Gene Loss: If genetic material (like a plasmid) offers no selective advantage, it is lost to reduce metabolic burden.

Eukaryotic Transcriptional Complexity

  • Monocistronic Organization: Unlike prokaryotes, eukaryotic genes are regulated independently, even if they belong to the same pathway (e.g., tryptophan synthesis in yeast).
  • Intron-Exon Structure: Eukaryotic genes contain exons (coding) and introns (non-coding). Introns must be spliced out to create mature mRNA.
  • Complexity Metrics: Eukaryotes increase complexity not necessarily by adding genes, but by diversifying protein outputs.
    • Alternative Splicing: One gene can produce multiple different transcripts by skipping or including different exons.
    • Alternative Promoters: Transcription can start at different sites.
    • mRNA Editing: Changes to the transcript sequence after synthesis.
    • Post-Translational Modifications: Variations to proteins after synthesis (e.g., histones).
  • Statistics: Humans have ~20,000 genes which produce ~100,000 transcripts, leading to >1,000,000 protein variants.

Comparative Genome Analysis

  • Yeast (Saccharomyces cerevisiae): Small eukaryotic genome (1.2×107basepairs1.2 \times 10^7\,base\,pairs) with ~$6,000$ genes; maintains small size for rapid replication.
  • Arabidopsis thaliana: Small plant genome (1.25×108basepairs1.25 \times 10^8\,base\,pairs) with ~$25,000$ genes.
  • Wheat (Triticum aestivum): Gigantic genome that is difficult to sequence and assemble.
    • Wheat is hexaploid, resulting from two separate hybridization/polyploidization events between three progenitor species.
    • It contains massive amounts of repetitive DNA packaged in heterochromatin.

Questions & Discussion

  • Question: What will happen as a result of a two-nucleotide deletion in the middle of an intron?
  • Audience Responses: Discussion occurred regarding frameshift mutations, loss of translation, and splicing errors.
  • Explanation: The most likely result is no effect at all. Because introns are spliced out, a middle-intron deletion does not shift the reading frame of the exons. Splicing only fails if the mutation occurs in specific recognition sequences, such as the branch point usually located near the 3' end of the intron.
  • Human Gene Characteristics:
    • Average number of exons per gene: ~$9$
    • Average internal exon size: ~$145\,base\,pairs$
    • Average intron size: ~$3,000\,base\,pairs$
    • Over 90% of human genes are "complex transcriptional units" (contain introns and undergo alternative splicing).