AA

Molecular Evolution

Molecular Evolution

Lecture Objectives

  • Explain how few genes can contribute to many phenotypes.
  • Explain the C-value paradox.
  • Summarize the major results of the ENCODE project and its relation to junk DNA.
  • Explain what is meant by a minimum genome.
  • Describe the outcomes of gene duplication and their contribution to genome size and complexity.
  • Differentiate between abundant genes, scarce genes, housekeeping genes, and luxury genes.
  • Differentiate between orthologues and paralogues.
  • Define synonymous and nonsynonymous mutations and explain how they can be used to measure selection.
  • Describe the neutral theory of molecular evolution, the nearly neutral theory, and selectionist viewpoints.
  • Explain how mutation, selection, and drift can affect molecular evolution.
  • Explain genetic hitchhiking and linkage disequilibrium.
  • Explain the molecular clock principle.
  • Explain the introns early and introns late model and exon shuffling.
  • Describe pseudogenes and how they arise, how new gene function arises, and the role of transposable elements in molecular evolution.

Total Gene Number in Eukaryotes

  • The total number of genes is known for several eukaryotes.
  • Figure 5.5: Functions of Drosophila genes are based on comparative genomics of twelve species.

Different Types of Genes

  • Orthologous genes (orthologs): Related genes in different species.
  • The minimum size of the proteome can be estimated from the number of types of genes.
  • Figure 5.7: The fruit fly genome can be divided into genes present in all eukaryotes, genes present in all multicellular eukaryotes, and genes specific to flies.

Human Genome Gene Count

  • The human genome has fewer genes than originally expected (approximately 20,000 genes).
  • Exons comprise only 1% of the human genome.
  • Exons comprise about 5% of each gene, so genes (exons plus introns) make up about 25% of the genome.
  • Figure 5.9: Genes occupy 25% of the human genome, but protein-coding sequences are a small part of this.

Genome Size and Complexity

  • There is no clear correlation between genome size and genetic complexity.
  • C-value: The total amount of DNA in the genome per haploid set of chromosomes.
  • C-value paradox: The lack of relationship between the DNA content (C-value) of an organism and its coding potential.
  • Figure 5.29: DNA content of the haploid genome increases with the morphological complexity of lower eukaryotes.
  • There is an increase in the minimum genome size associated with organisms of increasing complexity.
  • There are wide variations in the genome sizes of organisms within many taxonomic groups.
  • Figure 5.30: The minimum genome size found in each taxonomic group increases from prokaryotes to mammals.

Alternative Splicing

  • Roughly 60% of human genes are alternatively spliced.
  • Up to 80% of the alternative splices change protein sequence, so the human proteome has 50,000 to 60,000 members.

mRNA Types

  • monocistronic mRNA: mRNA that encodes one polypeptide.
  • polycistronic mRNA: mRNA that includes coding regions representing more than one gene.

Distribution of Genes and Sequences

  • Repeated sequences (present in more than one copy) account for more than 50% of the human genome.
  • The bulk of repeated sequences consists of copies of nonfunctional transposons.
  • There are many duplications of large chromosome regions.
  • Figure 5.12: The largest component of the human genome consists of transposons.

Essential Genes

  • Not all genes are essential. In yeast and flies, deletions of less than 50% of the genes have detectable effects.
  • When two or more genes are redundant, a mutation in any one of them might not have detectable effects.
  • Figure 5.14: Essential yeast genes are found in all classes.
  • We do not fully understand the persistence of genes that are apparently dispensable in the genome.
  • Figure 5.15: A systematic analysis of loss of function for 86% of worm genes shows that only 10% have detectable effects on the phenotype.

Gene Expression Levels

  • About 10,000 genes are expressed at widely differing levels in a eukaryotic cell.
  • In any particular cell, most genes are expressed at a low level.
  • scarce (complex) mRNA: mRNA that consists of a large number of individual mRNA species, each present in very few copies per cell. This accounts for most of the sequence complexity in RNA.
  • Only a small number of genes, whose products are specialized for the cell type, are highly expressed.
  • abundance: The average number of mRNA molecules per cell.
  • abundant mRNA: Consists of a small number of individual species, each present in a large number of copies per cell.
  • mRNAs expressed at low levels overlap extensively when different cell types are compared.
  • housekeeping gene: A gene that is (theoretically) expressed in all cells because it provides basic functions needed for sustenance of all cell types.
  • The abundantly expressed mRNAs are usually specific for the cell type.
  • luxury gene: A gene encoding a specialized function (usually) synthesized in large amounts in particular cell types.
  • About 10,000 expressed genes might be common to most cell types of a multicellular eukaryote.

DNA Sequence Evolution

  • The probability of a mutation is influenced by the likelihood that the particular error will occur and the likelihood that it will be repaired.
  • synonymous mutation: A change in DNA sequence in a coding region that does not alter the amino acid that is encoded.
  • nonsynonymous mutation: A change in DNA sequence in a coding region that alters the amino acid that is encoded.
  • In small populations, the frequency of a mutation will change randomly, and new mutations are likely to be eliminated by chance.
  • fixation: The process by which a new allele replaces the allele that was previously predominant in a population.
  • The frequency of a neutral mutation largely depends on genetic drift, the strength of which depends on the size of the population.
  • The frequency of a mutation that affects phenotype will be influenced by negative or positive selection.
  • Figure 5.21A: The fixation or loss of alleles by random genetic drift in populations of 10.
  • Figure 5.21B: The fixation or loss of alleles by random genetic drift in populations of 100.

Selection and Sequence Variation

  • The ratio of nonsynonymous to synonymous substitutions in the evolutionary history of a gene is a measure of positive or negative selection.
  • Low heterozygosity of a gene might indicate recent selective events.
  • genetic hitchhiking: The change in frequency of a genetic variant due to its linkage to a selected variant at another locus.
  • Comparing the rates of substitution among related species can indicate whether selection on the gene has occurred.
  • linkage disequilibrium: A nonrandom association between alleles at two different loci, often as a result of linkage.
  • Figure 5.22: A higher number of nonsynonymous substitutions in lysozyme sequences in the cow/deer lineage as compared to the pig lineage.
  • Most functional genetic variation in the human species affects gene regulation and not variation in proteins.

Molecular Clock

  • The sequences of orthologous genes in different species vary at nonsynonymous sites (where mutations have caused amino acid substitutions) and synonymous sites (where mutation has not affected the amino acid sequence).
  • Synonymous substitutions accumulate about 10 times faster than nonsynonymous substitutions.
  • The evolutionary divergence between two DNA sequences is measured by the corrected percentage of positions at which the corresponding nucleotides differ.
  • Substitutions can accumulate at a more or less constant rate after genes separate, so that the divergence between any pair of globin sequences is proportional to the time since they shared common ancestry.
  • Figure 5.26: Nonsynonymous site divergences between pairs of β-globin genes allow the history of the human cluster to be reconstructed.
  • codon bias: A higher usage of one codon in genes to encode amino acids for which there are several synonymous codons.

Evolution of Interrupted Genes

  • An interesting evolutionary question is whether genes originated with introns or were originally uninterrupted.
  • “introns late” model: The hypothesis that the earliest genes did not contain introns, and that introns were subsequently added to some genes.
  • Interrupted genes that correspond either to proteins or to independently functioning noncoding RNAs probably originated in an interrupted form (“introns early” hypothesis).
  • exon shuffling: The hypothesis that genes have evolved by the recombination of various exons encoding functional protein domains.
  • The interruption allowed base order to better satisfy the potential for stem–loop extrusion from duplex DNA, perhaps to facilitate recombination repair of errors.
  • A special class of introns is mobile and can insert themselves into genes.

Morphological Complexity Evolution

  • In general, comparisons of eukaryotes to prokaryotes, multicellular to unicellular eukaryotes, and vertebrate to invertebrate animals show a positive correlation between gene number and morphological complexity as additional genes are needed with generally increased complexity.
  • Most of the genes that are unique to vertebrates are concerned with the immune or nervous systems.
  • Figure 5.31: Human genes can be classified according to how widely their homologs are distributed in other species.

Gene Duplication

  • Duplicated genes can diverge to generate different genes, or one copy might become an inactive pseudogene.

Globin Clusters

  • All globin genes are descended by duplication and mutation from an ancestral gene that had three exons.
  • The ancestral gene gave rise to myoglobin, leghemoglobin, and α- and β-globins.
  • The α- and β-globin genes separated in the period of early vertebrate evolution, after which duplications generated the individual clusters of separate α- and β-like genes.
  • Figure 5.35: Each of the α-like and β-like globin gene families is organized into a single cluster, which includes functional genes and pseudogenes (ψ).
  • nonallelic genes: Two (or more) copies of the same gene that are present at different locations in the genome (contrasted with alleles, which are copies of the same gene derived from different parents and present at the same location on the homologous chromosomes).
  • When a gene has been inactivated by mutation, it can accumulate further mutations and become a pseudogene (ψ), which is homologous to the functional gene(s) but has no functional role (or at least has lost its original function).

Pseudogenes

  • Processed pseudogenes result from reverse transcription and integration of mRNA transcripts.
  • Nonprocessed pseudogenes result from incomplete duplication or second-copy mutation of functional genes.
  • Some pseudogenes might gain functions different from those of their parent genes, such as regulation of gene expression, and take on different names.
  • Figure 5.38: Many changes have occurred in a β-globin gene since it became a pseudogene.

Genome Duplication

  • Genome duplication occurs when polyploidization increases the chromosome number by a multiple of two.
  • autopolyploidy: Polyploidization resulting from mitotic or meiotic errors within a species.
  • allopolyploidy: Polyploidization resulting from hybridization between two different but reproductively compatible species.
  • Genome duplication events can be obscured by the evolution and/or loss of duplicates as well as by chromosome rearrangements.
  • Genome duplication has been detected in the evolutionary history of many flowering plants and of vertebrate animals.
  • 2R hypothesis: The hypothesis that the early vertebrate genome underwent two rounds of duplication.

Transposable Elements

  • Transposable elements tend to increase in copy number when introduced to a genome but are kept in check by negative selection and transposition regulation mechanisms.

Mutation, Gene Conversion, and Codon Usage Biases

  • Mutational bias can account for a high AT content in organismal genomes.
  • Gene conversion bias, which tends to increase GC content, can act in partial opposition to the mutational bias.
  • Codon bias might be a result of adaptive mechanisms that favor particular sequences and of gene conversion bias.