Eukaryotic Genome Organization and Gene Family and Evolution

Protein Domains and Evolutionary Recombination

  • Connection Between Exons and Protein Domains:

    • Exons frequently correspond to individual protein domains.

    • Protein domains represent structural folds of a protein that typically possess unique, independent functions.

    • Due to this modularity, alternative splicing can produce functional proteins comprised of fewer functional domains while maintaining overall viability.

  • Comparisons Across Eukaryotic Genomes:

    • Unicellular eukaryotes, such as yeast, possess approximately 400400 fewer protein domains than humans (800800 vs. 12001200 approximately).

    • The primary takeaway is that biological complexity is not derived from the absolute number of functional protein domain units, but rather from how they are arranged and recombined through complex transcriptional units.

  • Evolutionary Efficiency:

    • Shifting an existing protein domain to a new gene via recombination is a far more effective evolutionary step than generating a new functional protein through random DNA mutations.

    • Protein domains act like "Lego bricks" that can be rearranged into various final structural proteins.

    • Named Example: The ATP binding and ATP hydrolysis domains are essential for energetic functions. Cleaving ATP to release a phosphate group provides the energy required to change protein conformation for metabolic reactions, protein modification, or unwinding DNA. These domains are frequently reused across different categories of proteins.

Case Study: Plant Pathogens and Integrated Domains

  • Pathogen Virulence Strategy:

    • Plant pathogens (fungi or bacteria) secrete proteins into the plant environment (intercellular or intracellular).

    • These secreted proteins bind to host proteins to manipulate plant processes, either blocking or enhancing them, to gain a virulence advantage.

  • Plant Resistance Receptors (NBLRRs):

    • Plants utilize a class of intracellular resistance receptors known as NBLRRs.

    • When these receptors bind to pathogen proteins, they form multimeric structures that trigger an innate immune response, often resulting in localized cell death to stop the infection.

  • The Recombination "Hijacking" Event:

    • Evolutionary evidence shows NBLRRs can acquire "integrated domains" through exon exchange.

    • Named Example: A WRKY transcription factor exon can be added to the end of a resistance receptor gene.

    • Instead of evolving a new binding site via mutation, the receptor "hijacks" a target that the pathogen protein was already evolved to bind to. This allows the plant to evolve resistance rapidly.

  • Evidence in Rice (Oryza sativa):

    • A study of Indica and Japonica rice varieties revealed identical resistance gene structures at the same locus, but with two different integrated domains at the ends.

    • Broadening the search to wild rice relatives uncovered six or seven more distinct integrated domains at this specific locus, illustrating a frequent domain-swapping event.

Organization of Eukaryotic vs. Prokaryotic Genomes

  • Prokaryotic Genome Characteristics (E. coli):

    • High gene density: A 140 kilobase (kb)140\text{ kilobase (kb)} window contains approximately 130130 genes.

    • Minimal space between genes; genes are organized into operons.

    • Polycystronic transcripts: One regulatory sequence/promoter leads to a single transcript that is processed into multiple proteins (e.g., a five-protein operon).

  • Eukaryotic Genome Characteristics:

    • Large intergenic spaces: Low gene density.

    • Complex transcriptional units: Presence of introns and the ability to undergo RNA splicing are exclusive features of eukaryotes.

    • Example: A human chromosome of 150,000,000 base pairs150,000,000\text{ base pairs} contains only 3,0003,000 genes. A segment representing 0.5%0.5\% of the chromosome might contain only 1515 genes.

    • Metabolic Burden: Smaller eukaryotic genomes (like yeast) have smaller gaps because single cells that replicate quickly face a metabolic disadvantage if they must replicate "extra" non-coding DNA. Long-lived multicellular organisms are less affected by this replication rate stress.

  • Genome Size and Complexity:

    • There is no direct relationship between genome size and biological complexity in eukaryotes.

    • Arabidopsis: A small plant genome with 4545 genes per 140 kb140\text{ kb}.

    • Rice: 2020 genes per 140 kb140\text{ kb}.

    • Wheat: A massive genome filled with repetitive and transposable elements, resulting in very low gene density.

    • Human Genome Composition: Only about 1.3%1.3\% of the human genome consists of protein-coding exons. The rest comprises introns, transposable elements, and tandem repeats.

Questions & Discussion

  • Poll Question: An organism has 5,5005,500 genes, complex transcriptional units, and most of the genome is taken up by genes. What is it?

    • Answer: A unicellular eukaryote (e.g., yeast).

    • Explanation: While the small gene count and high density might suggest a prokaryote, the presence of "complex transcriptional units" (introns and splicing) definitively identifies it as a eukaryote.

Gene Families and Duplication Mechanisms

  • Tandem Duplication:

    • The primary mechanism for creating new genes is the duplication of an existing gene (often nearby on the same chromosome).

    • Initially, the copies are identical. Over time, one copy may acquire mutations leading to "sub-functionalization" (specializing in a sub-task) or acquiring a completely new function.

  • Prevalence:

    • Approximately 50%50\% of the human genome belongs to gene families (e.g., Transcription factors, Transporters).

    • Olfactory Receptor Superfamily: One of the largest human gene families (between 22 and 1,0001,000 members). However, about half are pseudogenes (non-functional due to mutation), whereas other mammals (like dogs) retain many more active members for scent discrimination.

  • The Beta-Globin Gene Cluster:

    • Globin genes are essential for transporting oxygen via hemoglobin (2 alpha2 \text{ alpha} and 2 beta2 \text{ beta} proteins + heme cofactor).

    • Evolutionary Timeline: Primordial globin $\rightarrow$ duplication $\rightarrow$ translocation to different chromosomes (Chromosome 1616 for alpha, Chromosome 1111 for beta).

    • Fetal Hemoglobin (γ globin\gamma\text{ globin}): A specialized subunit with a higher affinity for oxygen than adult hemoglobin. This allows a fetus to "steal" oxygen from the maternal blood supply at the placenta interface.

  • Evolution of Resistance Genes in Plants:

    • Cyanobacteria: 11 gene.

    • Algae: 55 genes.

    • Land Plants: Ancestors had ~314314 members; modern wheat or rice have thousands.

    • Reason: Strong selection pressure to duplicate and mutate genes to recognize evolving pathogens.

Repetitive DNA and Copy Number Variation (CNV)

  • Copy Number Variation: Differences in the number of copies of any DNA sequence, including genes, repeated sequences, or transposable elements.

    • Gene Dosage Effect: Increasing the number of gene copies can change phenotype (e.g., zebrafish midline pigmentation).

  • Transposable Elements (Mobile DNA):

    • Account for nearly 50%50\% of the human genome.

    • Alu element: A specific transposable element making up 11%11\% of the human genome.

  • Tandem Repeats:

    • Make up about 12%12\% of the human genome. Lengths vary from 20 to 100 kb20\text{ to }100\text{ kb}.

    • Simple Sequence Repeats (SSRs): E.g., dinucleotide (ATATAT...ATATAT...) or trinucleotide repeats.

    • Trinucleotide repeats: Can occur within codons (three nucleotides). Expansion (e.g., from 1313 to 40+40+ repeats) is associated with specific human diseases.

    • Localization: Often concentrated in centromeres and telomeres; they may facilitate heterochromatin formation (tight DNA packaging) for chromosomal protection.

  • DNA Fingerprinting:

    • Utilizes the high variability in repetitive sequence lengths between individuals.

    • PCR is used to amplify these regions; varying repeat lengths produce distinct profiles on a gel, allowing for individual discrimination in forensic science.

Organelle Genomes

  • Endosymbiotic Theory:

    • Mitochondria originated from the uptake of an early aerobic bacterium by an anaerobic eukaryote.

    • Chloroplasts originated from the uptake of cyanobacteria.

  • Genomic Characteristics:

    • Organelle genomes are circular (a bacterial trait).

    • Over evolutionary time, most organelle genes were either lost or transferred to the nucleus.

    • The nucleus has superior DNA repair mechanisms compared to the mitochondria.

  • Mammalian Mitochondrial DNA:

    • Contains only 1313 protein-coding genes.

    • Retains tRNA genes for local protein synthesis.

    • Essential proteins like RNA polymerase and DNA polymerase are now encoded in the nucleus and trafficked back to the mitochondria.

  • Chloroplast DNA:

    • Slightly larger than mitochondrial DNA.

    • Maintains protein-coding genes for photosynthesis and various tRNA genes.

    • Similar to mitochondria, many genes have migrated to the nucleus for better protection and maintenance.

Anecdote: The Mammoth Meatball

  • Researchers extracted mammoth globin gene sequences from DNA found in ice/amber.

    • Gaps in the sequence were filled using elephant DNA.

    • The gene was expressed in transgenic sheep to produce mammoth protein to create a meatball.

    • Food regulatory bodies banned its consumption due to unknown immunological risks, as the protein had not existed for tens of thousands of years. It now remains on ice in a museum.