Eukaryotic Genome Organization and Gene Family and Evolution
Protein Domains and Evolutionary Recombination
Connection Between Exons and Protein Domains:
Exons frequently correspond to individual protein domains.
Protein domains represent structural folds of a protein that typically possess unique, independent functions.
Due to this modularity, alternative splicing can produce functional proteins comprised of fewer functional domains while maintaining overall viability.
Comparisons Across Eukaryotic Genomes:
Unicellular eukaryotes, such as yeast, possess approximately fewer protein domains than humans ( vs. approximately).
The primary takeaway is that biological complexity is not derived from the absolute number of functional protein domain units, but rather from how they are arranged and recombined through complex transcriptional units.
Evolutionary Efficiency:
Shifting an existing protein domain to a new gene via recombination is a far more effective evolutionary step than generating a new functional protein through random DNA mutations.
Protein domains act like "Lego bricks" that can be rearranged into various final structural proteins.
Named Example: The ATP binding and ATP hydrolysis domains are essential for energetic functions. Cleaving ATP to release a phosphate group provides the energy required to change protein conformation for metabolic reactions, protein modification, or unwinding DNA. These domains are frequently reused across different categories of proteins.
Case Study: Plant Pathogens and Integrated Domains
Pathogen Virulence Strategy:
Plant pathogens (fungi or bacteria) secrete proteins into the plant environment (intercellular or intracellular).
These secreted proteins bind to host proteins to manipulate plant processes, either blocking or enhancing them, to gain a virulence advantage.
Plant Resistance Receptors (NBLRRs):
Plants utilize a class of intracellular resistance receptors known as NBLRRs.
When these receptors bind to pathogen proteins, they form multimeric structures that trigger an innate immune response, often resulting in localized cell death to stop the infection.
The Recombination "Hijacking" Event:
Evolutionary evidence shows NBLRRs can acquire "integrated domains" through exon exchange.
Named Example: A WRKY transcription factor exon can be added to the end of a resistance receptor gene.
Instead of evolving a new binding site via mutation, the receptor "hijacks" a target that the pathogen protein was already evolved to bind to. This allows the plant to evolve resistance rapidly.
Evidence in Rice (Oryza sativa):
A study of Indica and Japonica rice varieties revealed identical resistance gene structures at the same locus, but with two different integrated domains at the ends.
Broadening the search to wild rice relatives uncovered six or seven more distinct integrated domains at this specific locus, illustrating a frequent domain-swapping event.
Organization of Eukaryotic vs. Prokaryotic Genomes
Prokaryotic Genome Characteristics (E. coli):
High gene density: A window contains approximately genes.
Minimal space between genes; genes are organized into operons.
Polycystronic transcripts: One regulatory sequence/promoter leads to a single transcript that is processed into multiple proteins (e.g., a five-protein operon).
Eukaryotic Genome Characteristics:
Large intergenic spaces: Low gene density.
Complex transcriptional units: Presence of introns and the ability to undergo RNA splicing are exclusive features of eukaryotes.
Example: A human chromosome of contains only genes. A segment representing of the chromosome might contain only genes.
Metabolic Burden: Smaller eukaryotic genomes (like yeast) have smaller gaps because single cells that replicate quickly face a metabolic disadvantage if they must replicate "extra" non-coding DNA. Long-lived multicellular organisms are less affected by this replication rate stress.
Genome Size and Complexity:
There is no direct relationship between genome size and biological complexity in eukaryotes.
Arabidopsis: A small plant genome with genes per .
Rice: genes per .
Wheat: A massive genome filled with repetitive and transposable elements, resulting in very low gene density.
Human Genome Composition: Only about of the human genome consists of protein-coding exons. The rest comprises introns, transposable elements, and tandem repeats.
Questions & Discussion
Poll Question: An organism has genes, complex transcriptional units, and most of the genome is taken up by genes. What is it?
Answer: A unicellular eukaryote (e.g., yeast).
Explanation: While the small gene count and high density might suggest a prokaryote, the presence of "complex transcriptional units" (introns and splicing) definitively identifies it as a eukaryote.
Gene Families and Duplication Mechanisms
Tandem Duplication:
The primary mechanism for creating new genes is the duplication of an existing gene (often nearby on the same chromosome).
Initially, the copies are identical. Over time, one copy may acquire mutations leading to "sub-functionalization" (specializing in a sub-task) or acquiring a completely new function.
Prevalence:
Approximately of the human genome belongs to gene families (e.g., Transcription factors, Transporters).
Olfactory Receptor Superfamily: One of the largest human gene families (between and members). However, about half are pseudogenes (non-functional due to mutation), whereas other mammals (like dogs) retain many more active members for scent discrimination.
The Beta-Globin Gene Cluster:
Globin genes are essential for transporting oxygen via hemoglobin ( and proteins + heme cofactor).
Evolutionary Timeline: Primordial globin $\rightarrow$ duplication $\rightarrow$ translocation to different chromosomes (Chromosome for alpha, Chromosome for beta).
Fetal Hemoglobin (): A specialized subunit with a higher affinity for oxygen than adult hemoglobin. This allows a fetus to "steal" oxygen from the maternal blood supply at the placenta interface.
Evolution of Resistance Genes in Plants:
Cyanobacteria: gene.
Algae: genes.
Land Plants: Ancestors had ~ members; modern wheat or rice have thousands.
Reason: Strong selection pressure to duplicate and mutate genes to recognize evolving pathogens.
Repetitive DNA and Copy Number Variation (CNV)
Copy Number Variation: Differences in the number of copies of any DNA sequence, including genes, repeated sequences, or transposable elements.
Gene Dosage Effect: Increasing the number of gene copies can change phenotype (e.g., zebrafish midline pigmentation).
Transposable Elements (Mobile DNA):
Account for nearly of the human genome.
Alu element: A specific transposable element making up of the human genome.
Tandem Repeats:
Make up about of the human genome. Lengths vary from .
Simple Sequence Repeats (SSRs): E.g., dinucleotide () or trinucleotide repeats.
Trinucleotide repeats: Can occur within codons (three nucleotides). Expansion (e.g., from to repeats) is associated with specific human diseases.
Localization: Often concentrated in centromeres and telomeres; they may facilitate heterochromatin formation (tight DNA packaging) for chromosomal protection.
DNA Fingerprinting:
Utilizes the high variability in repetitive sequence lengths between individuals.
PCR is used to amplify these regions; varying repeat lengths produce distinct profiles on a gel, allowing for individual discrimination in forensic science.
Organelle Genomes
Endosymbiotic Theory:
Mitochondria originated from the uptake of an early aerobic bacterium by an anaerobic eukaryote.
Chloroplasts originated from the uptake of cyanobacteria.
Genomic Characteristics:
Organelle genomes are circular (a bacterial trait).
Over evolutionary time, most organelle genes were either lost or transferred to the nucleus.
The nucleus has superior DNA repair mechanisms compared to the mitochondria.
Mammalian Mitochondrial DNA:
Contains only protein-coding genes.
Retains tRNA genes for local protein synthesis.
Essential proteins like RNA polymerase and DNA polymerase are now encoded in the nucleus and trafficked back to the mitochondria.
Chloroplast DNA:
Slightly larger than mitochondrial DNA.
Maintains protein-coding genes for photosynthesis and various tRNA genes.
Similar to mitochondria, many genes have migrated to the nucleus for better protection and maintenance.
Anecdote: The Mammoth Meatball
Researchers extracted mammoth globin gene sequences from DNA found in ice/amber.
Gaps in the sequence were filled using elephant DNA.
The gene was expressed in transgenic sheep to produce mammoth protein to create a meatball.
Food regulatory bodies banned its consumption due to unknown immunological risks, as the protein had not existed for tens of thousands of years. It now remains on ice in a museum.