Lecture 3: The Birth of Genes 👶🧬

Recommended Reading:

  • Strachan and Read (2019). Human Molecular Genetics (5th edition), pp. 299-301, 432-440, 452-453.

  • Kaessmann (2010). Origins, evolution, and phenotypic impact of new genes. Genome research 20, 1313-1326.

  • Recent research has significantly advanced our understanding of new gene origins and their impact on evolution. New genes arise through various mechanisms, including gene duplication, de novo formation from noncoding sequences, and co-option of genomic parasites (Kaessmann, 2010). These novel genes rapidly integrate into existing gene-gene interaction networks, often starting on the periphery and evolving into highly connected hubs (Zhang et al., 2015). New genes contribute to phenotypic innovations in development, reproduction, and brain function (Chen et al., 2013; Zhang & Long, 2014). In humans, over 300 human-specific and 1000 primate-specific genes have been identified, with many involved in brain development and male reproduction (Zhang & Long, 2014). Young genes show enrichment in the male reproductive system and functions related to human innovations like increased brain size and bipedal locomotion (Chen, 2023). The study of new genes has been facilitated by genomic data and deep taxon phylogenomics, providing insights into their evolutionary dynamics and functional impacts (Rödelsperger et al., 2019).


Recap: Gene Families and Genome Similarity

  • Around 80% of human genes have a direct equivalent (orthologue) in the mouse.

  • Less than 1% of human genes have no related genes in the mouse.

  • Vertebrate genomes are remarkably similar.

  • Changes in gene families are common during evolution.

  • Most differences in gene content between vertebrate genomes are due to changes in the number of members within gene families.

    • Example: Olfactory receptors show gene family contraction in humans (396) compared to mice (1035). C2H2 Zn Finger Transcription Factors show expansion in humans (712) compared to mice (583).

  • Nomenclature:

    • Homologues: Genes sharing a common ancestor (e.g., all globin genes).

    • Orthologues: Homologous genes in different species that arose from a common ancestral gene due to speciation (e.g., human and cat β-globin).

    • Paralogues: Homologous genes within the same species that arose from gene duplication (e.g., human β-globin and human α-globin).

  • Whole Genome Duplication (WGD):

    • Sometimes the entire genome is duplicated, leading to polyploidy.

    • Examples include durum wheat (tetraploid) and bread wheat (hexaploid).

    • The vertebrate genome is believed to have undergone two rounds of WGD about 500 million years ago.

    • The "One Thousand Plant Transcriptomes Initiative" highlights frequent WGD events in plant ancestry.

  • Expansion of specific gene families, like Zn finger transcription factors involved in development (e.g., ARHGAP11B linked to neocortex expansion), can contribute to evolutionary changes.


How do gene families expand and contract, and how do new genes arise? 🆕

New genes can appear in a genome via several mechanisms:

  • Exon shuffling

  • Gene duplication

  • Insertion into the genome of reverse-transcribed mRNA to generate a retrogene

  • Genes that appear de novo

  • Horizontal gene transfer (very rare in vertebrates, none in humans)


Exon Shuffling 🧩

  • Exons from one gene become inserted into, or fused with, another gene due to DNA rearrangement processes, such as those mediated by transposons. This creates a "new" gene with a novel combination of exons.

  • Important Distinction: Exon shuffling (a DNA-level event creating new genes) should not be confused with alternative splicing (an RNA-level regulated process allowing a single gene to produce multiple protein isoforms from the same gene transcript).


Gene Duplication

  • Most genes, including new members of gene families, arise from the duplication of an existing gene.

  • It's estimated that every gene undergoes a duplication event roughly once every 200 million years.

  • This process starts with an ancestral gene and results in two identical copies.

Mechanisms of Gene Duplication:

  • Unequal crossing over between repetitive sequences (e.g., Alu sequences) flanking a gene during meiosis can lead to one chromosome with a duplicated gene and another with a deletion.

  • Unequal crossing over between already duplicated genes within a cluster can generate new hybrid genes during meiosis.

  • Whole genome duplication.

The Fate of Duplicated Genes:

  • Nonfunctionalization (most common): One copy becomes inactivated due to mutations and becomes a pseudogene. There might be selection for this if the gene product is harmful in excess.

  • Retention of the duplicate occurs if:

    • Neofunctionalization: One duplicate acquires mutations that give it a new, beneficial function.

      • Examples:

        • Genes encoding caseins (milk proteins) evolved from a duplicate of the ODAM gene (tooth protein).

        • The antifreeze glycoprotein gene in Antarctic fish evolved from a duplicated trypsinogen gene. The original trypsinogen gene had exons; the new antifreeze gene primarily consists of an expansion of a Thr-Ala-Ala repeat motif, with the original exon structure largely lost or modified.

        • The protein controlling nectar spur formation in some nasturtiums evolved from a duplicated TCP4 transcription factor that originally controlled flower symmetry. Akania bidwillii has one TCP4 copy and short flowers without spurs, while Tropaeolum longifolium has the original TCP4 and a duplicated, neofunctionalized TCP4L2 involved in nectar production, resulting in long flowers with spurs.

    • Subfunctionalization: The original gene's functions are partitioned between the two duplicates. This can happen if a gene was originally expressed in several tissues, and after duplication, each copy becomes specialized for expression in a subset of those tissues.

      • Example: The Myb1 transcription factor in clarkia flowers activates pigment deposition. In an ancestral state, one CgMyb gene might drive expression in petal spots and the background. After duplication and subfunctionalization, one copy (e.g., CgMyb1b) might lose the sequence for spot expression but retain background expression, while the other (e.g., CgMyb1c) loses background expression capability but retains spot expression. Dudley’s clarkia shows Dfr expression throughout the petal, while Slender clarkia (with duplicated Myb genes) shows Dfr expression at different times and regions.

    • Beneficial in excess: The gene encodes a protein that is advantageous in larger quantities (e.g., salivary amylase).


The Globin Gene Family: An Example of Gene Duplication and Evolution 🩸

  • Hemoglobin is a tetramer: (α-like)₂ (β-like)₂.

  • Different hemoglobin isoforms are used in embryonic (<8 weeks), fetal, and adult stages.

  • The human globin gene family has an α-like cluster on chromosome 16 and a β-like cluster on chromosome 11.

  • The α-cluster includes a ζ (zeta) globin gene (embryonic) and two identical α (alpha) globin genes (fetal and adult).

  • The human β-like globin cluster (chromosome 11):

    • Contains genes: ε (epsilon), Gγ (gamma-G), Aγ (gamma-A), ψβ (psi-beta, a pseudogene), δ (delta), and β (beta).

    • Order on chromosome (5' to 3'): ε - Gγ - Aγ - ψβ - δ - β.

    • This order shows temporal collinearity, meaning it corresponds to the order of gene expression during development:

      • Embryonic hemoglobin: ζ₂ε₂

      • Fetal hemoglobin: α₂γ₂ (using Gγ and Aγ)

      • Adult hemoglobin: 97% α₂β₂, 3% α₂δ₂

  • Evolution of the globin gene family:

    1. An ancestral globin gene duplicated to give rise to myoglobin and an ancestral "hemoglobin" gene.

    2. The "hemoglobin" gene duplicated, leading to linked α and β globin genes.

    3. Translocation separated the β globin gene, resulting in separate α and β globin gene clusters (as seen in most present-day fish and amphibia).

    4. Further gene duplications within these clusters led to the current arrangement seen in mammals and birds.

  • Unequal crossing over in the globin cluster can cause thalassemia (severe anemia in homozygotes).

    • For example, unequal crossing over between the δ and β genes can lead to a deletion of one gene (resulting in a hybrid Hb-Lepore gene, γδβ, causing thalassemia) or the creation of a new hybrid gene (Hb-antiLepore, γδβ, with no clinical consequences).


Retrogenes: New Genes from mRNA

  • Retrogenes and processed pseudogenes arise from the insertion of reverse-transcribed mRNAs into the genome.

  • This process utilizes the LINE-1 reverse transcriptase and endonuclease, which copies mRNA into DNA and inserts it at a random site in the genome. The mechanism is similar to LINE-1 transposition.

  • Characteristics of retrogenes/processed pseudogenes:

    • Lack introns (as they originate from processed mRNA).

    • Often have a poly(A) tail sequence at the 3' end.

    • Flanked by direct repeats generated during insertion.

  • Fate:

    • Most frequently, the inserted sequence lacks a promoter and becomes an inactive processed pseudogene.

    • Rarely, if the insertion occurs by chance adjacent to an existing promoter, it can become a potentially active retrogene.

  • The human genome has an estimated 285 retrogenes.

  • Significant Retrogene Example: The short-legged phenotype of the dachshund is due to the activity of a retrogene derived from the gene encoding Fibroblast Growth Factor (FGF)-4.


De Novo Genes: Genes from Scratch

  • A de novo gene is generated from DNA that was previously non-coding. The haploid human genome is 3.2 billion base pairs.

  • This process requires:

    1. Transcription of previously non-coding DNA (e.g., acquiring a TATA box or other promoter elements).

    2. The presence or generation of an open reading frame (ORF) with start and stop codons.

    3. The acquisition of a selectable function.

  • There may be at least 16 human-specific de novo genes.

  • Example: North Atlantic fish antifreeze glycoprotein (afgp).

    • Cod can live in sub-zero North Atlantic waters due to afgp in their blood, which has multiple repeats of threonine-alanine or proline-alanine.

    • The genes encoding cod afgp show no BLAST hits outside the cod family.

    • Comparison of syntenic chromosome regions (derived from the same ancestral region) between cod and other fish (like stickleback) shows that the site corresponding to the afgp genes is non-coding DNA in non-cod species. This indicates the cod afgp genes arose de novo from previously non-coding DNA.


Learning Outcomes 🎓

After this session, you should be able to:

  • Identify mechanisms that generate novel genes.

  • Describe and contrast the mechanisms by which gene duplications occur.

  • Describe the possible fate of a newly duplicated gene.

  • Describe the organisation of the β-globin gene cluster of humans.

  • Apply knowledge about mechanisms that generate novel genes to explain how the globin gene family has evolved.

  • Understand that reverse transcription of a cellular mRNA can generate active retrogenes.

  • Outline how genes may form de novo and give an example of a de novo gene.

Sources