DS

Molecular Evolution Notes

Molecular Evolution

Assisstant Prof. Cemalettin Bekpen's lecture on Molecular Evolution, part of Computational Biology II MBG2004.

Central Premise

Significant sequence similarity implies function assignment to unknown proteins based on known proteins due to evolutionary relationships.

Homolog

A gene/protein related to a second gene/protein by descent from a common ancestral gene via speciation.

Ortholog

Genes/proteins in different species evolved from a common ancestor via speciation, retaining the same function.

Paralog

Genes/proteins related by duplication of a common ancestral gene. These evolve new functions, which may or may not relate to that of the ancestor.

Speciation

Evolution of new gene/protein that is genetically independent from the ancestral gene.

Convergent Evolution

Evolution of similar features/properties in genes/proteins of different genetic lineages.

Divergent and Convergent Evolution Among the Serine Proteases

Examples include Trypsin (3NKK), Chymotrypsin (1ACB), and Subtilisin (1SBT).

Mechanisms of Molecular Evolution of Genes/Proteins

Mutation

Stochastic single point changes in genetic material due to:

  • Errors in DNA replication during mitosis
  • Radiation exposure
  • Chemical or environmental stressors
  • Viruses and transposable elements

Slow but constant rate (molecular clock) of 10^{-9} to 10^{-8} mutations per base per generation. Includes splicing errors in eukaryotes that retain introns.

Recombination

Exchange of genes (or portions) between different chromosomes to create new combinations.

Gene Duplication

Duplication of a gene (or portion). One copy retains the original function, while the other evolves and acquires new functions.

Retrotransposition

Incorporation of mRNA sequences back into DNA, often inserting into new locations with different expression patterns.

The mechanisms by which new genes/proteins arise enable sequence analysis to infer functional and structural relationships.

Evolutionary Selection

Natural Selection vs. Artificial Selection

Consider watching the YouTube video on this topic. Also review the Khan Academy article on Darwin, evolution, & natural selection.

Anagenesis

(Phyletic evolution): A single population transforms enough to be designated a new species.

Cladogenesis

Branching evolution: A new species arises from a small population that buds from a parent species. Most new species probably evolve by cladogenesis, the branching evolution that is the basis for biological diversity.

Directional, Disruptive, and Stabilizing Selection

  • Directional Selection: Favors individuals at one end of the phenotypic range.
  • Disruptive Selection: Favors individuals at both extremes of the phenotypic range.
  • Stabilizing Selection: Favors intermediate variants and acts against extreme phenotypes.
Directional Selection Example

The shift in moth population color from light to dark during the Industrial Revolution in England is an example of directional selection.

Disruptive Selection Example

Yearling male lazuli buntings with either bright or dull coloration being able to establish territories and breed, while those with intermediate plumage do not mate.

Stabilizing Selection Example

Stabilizing selection favors the most common phenotype as best adapted. It reduces variation by selecting against alleles that produce more extreme phenotypes. An example is birth weight; babies with weights too low or too high face increased risks.

Woodpeckers and wasps influence gall-fly populations, applying pressure that results in stabilizing selection.

Artificial Selection

Examples include the domestication of wolves into various dog breeds and the selective breeding of wild mustard into cabbage, brussel sprouts, cauliflower, broccoli, kale, and kohlrabi.

Lamarck's Giraffe vs Natural Selection

Contrast Lamarck's inheritance of acquired characteristics with natural selection, using the example of giraffe neck length. Also, consider the evolution of modern corn from Teosinte.

Adaptive Evolution

When natural selection favors a single allele and the allele frequency continuously shifts in one direction.

The Human Genome and Positive Selection

Human Genome Build 38

  • ~3 billion nucleotides or basepairs, ~3 million vary among random 2 humans
  • ~25,000 genes
  • only about 2 % of genome encodes for proteins

Chimpanzee Genome

  • Human and chimps diverged 5-6 million years ago (mya).
  • ~99% identical overall to the human genome.
  • ~30,000,000 nucleotide differences.
  • 29% of genes identical to human homologue (6,250 genes).
  • Average divergence per gene: 2 amino acid difference; one per lineage since human/chimp divergence.

Genome Wide McDonald-Kreitman Test

Red bars on the selection map indicate loci under negative selection. Blue bars represent loci under positive selection (95% credibility level). Strong evidence of selection at >99%.

Using Genetic Variation to Understand Natural Selection

To understand the presence and form of natural selection on genes by:

  1. Inferring ancestral states for genes.
  2. Inferring selection on amino acids in proteins with important functions and relating gene selection to phenotype selection.
  3. Inferring recent selective sweeps or balancing selection in the human genome.

Inferring Lineage Specific Evolution

Compare a gene of interest for different species

Measuring Positive Selection

Rate of synonymous mutations and rate of non-synonymous mutations

Changes in Protein Sequence

Changes in a protein sequence come from changes in the nucleotide sequence

Genetic Code and Synonymous/Nonsynonymous Changes

  • Synonymous Change: Does not change the amino acid encoded (e.g., TCT -> TCC, both coding for Serine).
  • Nonsynonymous Change: Changes the amino acid encoded (e.g., TCT -> TTC, changing from Serine to Phenylalanine).

Nonsynonymous changes are more likely to have functional consequences and are generally deleterious, thus removed from populations more rapidly. The rate of nonsynonymous change will be slower than the rate of synonymous change.

Inferring Adaptive Amino Acid Change in Proteins

Measuring selection on protein-coding genes:

-->Selection ‘for’ particular amino acid changes.

Changes are synonymous or non-synonymous

AAA \rightarrow AAG \text{ (Lysine)}

AAA \rightarrow GAA \text{ (Glutamic Acid)}

Synonymous or Nonsynonymous Change: dN/dS

  • dS: rate of synonymous change (e.g., per gene). Because synonymous changes do not affect the protein, most have little or no effect on organism fitness.
    • They are selectively neutral and accumulate at a constant rate (clock-like).
    • If species are far apart, correct for ‘multiple hits’ using a statistical model of sequence change.
  • dN: rate of nonsynonymous change (e.g., per gene). Nonsynonymous changes affect the protein, most are deleterious and lost.
    • So, dN rate is generally slower than dS. Hence dN/dS is generally less than 1.
  • If dN > dS, there have been many nonsynonymous changes, which is rare and a signature of adaptive evolution.

*What if dN/dS = 1?

Quantifying Non-Synonymous Variation

Estimate of positive selection

  • Synonymous mutations: neutral mutations.
  • Non-synonymous mutations: non-neutral mutations.

Codon Usage

Frequencies of different codons for the same amino acid are different.

Codon usage bias is caused by:

  • Translation machinery tends to use abundant tRNA (and codons corresponding).
    • Codon usage bias is the same for all highly expressed genes in the same organism.
  • Mutation pressure: Difference between mutation rates between GC à AT and AT à GC.
    • GC-content is different in different organisms.

The genetic code is redundant.

*Some amino acids coded by more than one codon.

*Proteins are more conserved during evolution.

DNA \rightarrow 4 \text{ letter alphabet}

Proteins \rightarrow 20 \text{ letter alphabet}

  • Two random DNA sequences 25% identical on average.
  • Two random protein sequences 5% identical on average.

dN/dS Interpretation

  • dN/ dS < 1 : replacements are deleterious (very few changes in amino acids, along lineage)
  • dN/ dS = 1 : replacements are neutral (changes just happen randomly)
  • dN/ dS > 1 : replacements are advantageous (lots of changes in amino acids along lineage)
  • Ratio of non-synonymous to synonymous changes=dN/ dS=Ka/Ks

dN/dS, Ka/Ks

Nei and Gojobori, 1986

  • Nd = Counts of non-synonymous mutations for each gene
  • Sd = Counts of synonymous mutations for each gene
  • N = Counts of potential non-synonymous sites for each gene
  • S = Counts of potential synonymous sites for each gene

KA = Nd / N and KS = Sd / S

Ratio KA/KS as an indicator of evolutionary mode in each gene Basic analyses of the proportion of non-synonymous to synonymous divergence KA/KS

Purifying, Neutral, and Positive Selection

  • dN/dS < 1: Purifying Selection
  • dN/dS = 1: Neutral Evolution
  • dN/dS > 1: Positive Selection

KA or dN: rate of non-synonymous divergence and KS or dS: rate of synonymous divergence between species

Estimating Non-Synonymous and Synonymous Polymorphisms

Estimates of non-synonymous and synonymous polymorphisms and substitutions provide insight into evolutionary processes By Analysing divergence and polymorphism:

  • KA / KS ratios > 1 indicate positive selection
  • KA / KS ratios < 1 indicate negative selection
  • KA / KS ratios = 1 indicates neutral evolution

KA and dN: rate of non-synonymous substitutions , KS and dS: rate of synonymous substitutions, PN: Amount of non-synonymous polymorphisms, PS: Amount of synonymous polymorphisms KA/Ks branch-specific estimate

Analogy Between Phenotype-Level and Genetic-Level Selection

  • Selection ‘for’ change in one direction
    • Directional selection on phenotype: Ala->Glu, Tyr->Ser (examples)
    • Positive selection on a gene:
  • Selection ‘for’ remaining the same
    • Stabilizing selection on phenotype
    • Purifying selection on a gene - Ala, Tyr, retained despite mutations to other amino acids

Positive selection is selection on a particular trait - and the increased frequency of an allele in a population

Excess of Function-Altering Mutations Example

In PRM1 exon 2, there are six differences between humans and chimpanzees, five of which alter amino acids.

Branch-Specific dN/dS Estimates Example

branch-specific dN/dS estimates for OGP (oviductal glycoprotein) for multiple species

Selective Sweeps and Balancing Selection

Alleles and Haplotypes that increase in frequency rapidly due to positive selection will carry lots of “hitch-hiking”, flanking DNA, creating a linkage disequilibrium signature

Infer Selection

Geographic variation in allele frequencies and patterns

Examples of Genes with Geographic Selection

Genes such as AGT, CYP3A, SLC24A5, FY, IL4, IL13, CASP12, NAT2, LCT, TRPV6, and MMP3, show evidence of geographically restricted selection in humans related to climate, pathogens, or diet.

Extreme Population Differences

Extreme population differences in FYO allele frequency; The FYO allele, which confers resistance to P. vivax malaria.

Studies of Selection Signatures in the Human Genome

Brains, food, reproduction and parasites

Genome-Wide Analysis

Most-significant categories showing positive selection in the human lineage include:

  • Immune system: parasites and pathogens
  • Reproduction: genes expressed in reproductive tissues
  • Nervous system genes: expressed in brain
  • Amino-acid metabolism: diet
  • Olfaction: sense of smell
  • Development: such as skeletal
  • Hearing: for speech perception

Biological Processes with Significant Excess of Positively Selected Genes

Immunity and defense, T-cell-mediated immunity, Chemosensory perception, Biological process unclassified, Olfaction, Gametogenesis, Natural killer-cell-mediated immunity, Spermatogenesis and motility, Inhibition of apoptosis, Interferon-mediated immunity, Sensory perception and B-cell- and antibody-mediated immunity.

Enrichment of GO Categories Example

Chemosensory perception, Olfaction, Gametogenesis, Spermatogenesis and motility, Fertilization, Other carbohydrate metabolism, Electron transport, Chromatin packaging/remodeling, MHC-I-mediated immunity, Steroid metabolism, Lipid and fatty acid binding, mRNA transcription initiation , Protein modification, Vitamin/cofactor transport, Phosphate metabolism and Peroxisome transport

Adaptive Evolution of Young Gene Duplicates

Human-specific duplicates evolving under adaptive natural selection include a surprising number of genes involved in neuronal and cognitive functions.

Hominid-Specific Gene Families Under Positive Selection Examples

Neuroblastoma breakpoint family NBPF and others such as, FAM75A ,Williams Beuren syndrome region 19 and etc.

Specific Genes Affecting Brain Size: Microcephaly Genes

Small (~430 cc v ~1,400 cc) but otherwise ~normal brain, only mild mental retardation. Can be due to loss of activity of the ASPM gene, Abnormal spindle-like microcephaly associated, or MCPH1 gene

Positive Selection of MCPH1 in Primate Evolution

Positive Selection of ASPM in Primate Evolution

ASPM is still evolving adaptively in human lineage?! and May related to forms of human language, tonal and non-tonal convergence

Genes related to Brain and Language Example: FOXP2

relating human adaptive molecular evolution to human disease, Crespi 2010, Evol. Appl. Genes subject to recent positive selection in humans are differentially involved in neurological diseases

Inheritance of a Language/Speech Defect

FOXP2

  • FOXP2 mutations results in an autosomal dominant communication disorder
  • Phenotype includes problems with speech articulation and deficits in many aspects of language and grammar
  • Intelligence varies among affected individuals but speech/language impairment is always present
  • Interestingly, deficits with language are not restricted to speech but influence writing and comprehension/expression Chromosome 7 7q31

FOXP2 is highly conserved throughout mammals and beyond but for three nucleotide substitutions that change the FOXP2 protein between humans and the mouse, and two have occurred along the human lineage Examination of human genetic variation suggests that the region surrounding the gene underwent a selective sweep in the past 200,000 years
Brains of individuals with FOXP2 mutations have reduced grey matter in the frontal gyrus which includes Broca’s area and Functional abnormalities in Broca’s area during language tasks

Positive selection linked to Neandertals

The Derived FOXP2 Variant of Modern Humans Was Shared with Neandertals

FOXP2: two genetic variants (SNPs) are associated with risk of some neurodevelopmental disorders involving speech and language, schizophrenia and autism

Positive Selection Related to the Human Genome

Many genes related to primate brain development have been subject to positive selection Have identified several positively-selected genes related to brain size and language in humans, but we do not know how they work These same genes are also involved in human disorders related to the brain and language

The Gene Example LCT

All infants have high lactase enzyme activity to digest the sugar lactose in milk In most humans, activity declines after weaning, but in some it persists: LCT*P

Molecular Basis of Lactase Persistence

Linkage and LD studies show association of lactase persistence with the T allele of a T/C polymorphism 14 kb upstream of the lactase gene and Lactase level is controlled by a cis-acting element

Genetic Signatures-Lactase Gene

Genetic Signatures of Strong Recent Positive Selection at the Lactase Gene and Convergent adaptation of human lactase persistence in Africa and Europe show selection

A SNP Examples

A SNP in the gene encoding lactase (LCT) (C/T-13910) is associated with the ability to digest milk as adults (lactase persistence) in Europeans, These data provide a marked example of convergent evolution due to strong selective pressure resulting from shared cultural traits-animal domestication and adult milk consumption and Genotyping across a 3-Mb region demonstrated haplotype homozygosity extending >2.0 Mb on chromosomes carrying C-14010, consistent with a selective sweep over the past ~7,000 years

Dietary Adaptation

Diet and Adaptation Example of Use of Starches

Not just milk - use of starches also increased in human diet, Diet and the evolution of human amylase gene copy number variation and Evidence for selection of suite of genes 'for' meat-eating adaptation

Human Adaptation to Diet

Better food, smaller guts, adaptations to meat, Humans adapted genetically to a novel diet that includes dairy products, grains, and more meat The selection involved has been strong The molecular adaptations involved in dietary adaptations tend to be local geographically, and still exhibit genetic polymorphisms

Mammals Rapid Evolution of Reproductive Proteins

Rapid Evolution of Reproductive Proteins in Mammals, Rapid Evolution of Fertilization Proteins and Male-female conflicts, Egg laying (increased). Receptivity to mating sperm displacement and more

Correlation Between Evolution and Selection

Correlation between SEMG2 Evolution and Primate Sexual Traits, comparative evidence for molecular adaptation related to sperm mobility, with implications for human male fertility different primate species and Carlson et al, research in fertile

Strong balancing selection at HLA loci

Evidence from segregation in South Amerindian families: A strong signature of balancing selection in the 5' cis-regulatory region of CCR5

Maladaptation-Byproducts and Local Adaptations

How selection on the human genome is related to disease; Strong, recent positive selection can create maladaptations as byproducts (via pleiotropy); Balancing selection creates maladapted homozygotes (as a form of tradeoff); Locally-selected adaptations become maladaptive with changes in the environment (such as recent human migrations)-local adaptation is common; Selection on brain, dietary, reproductive and disease genes has generated very rapid, recent, ongoing change, which helps in understanding human adaptation and disease

Immune recognition molecules: KIRS, LIRAS, HLAS, TRs Multiple changes, Inactivation or deletion, Duplication and Functional amino acid changes

Pkr Evolution in Response to Viral Mimicry - YouTube