Molecular Evolution Notes
Molecular Evolution
Assisstant Prof. Cemalettin Bekpen's lecture on Molecular Evolution, part of Computational Biology II MBG2004.
Central Premise
Significant sequence similarity implies function assignment to unknown proteins based on known proteins due to evolutionary relationships.
Homolog
A gene/protein related to a second gene/protein by descent from a common ancestral gene via speciation.
Ortholog
Genes/proteins in different species evolved from a common ancestor via speciation, retaining the same function.
Paralog
Genes/proteins related by duplication of a common ancestral gene. These evolve new functions, which may or may not relate to that of the ancestor.
Speciation
Evolution of new gene/protein that is genetically independent from the ancestral gene.
Convergent Evolution
Evolution of similar features/properties in genes/proteins of different genetic lineages.
Divergent and Convergent Evolution Among the Serine Proteases
Examples include Trypsin (3NKK), Chymotrypsin (1ACB), and Subtilisin (1SBT).
Mechanisms of Molecular Evolution of Genes/Proteins
Mutation
Stochastic single point changes in genetic material due to:
- Errors in DNA replication during mitosis
- Radiation exposure
- Chemical or environmental stressors
- Viruses and transposable elements
Slow but constant rate (molecular clock) of 10^{-9} to 10^{-8} mutations per base per generation. Includes splicing errors in eukaryotes that retain introns.
Recombination
Exchange of genes (or portions) between different chromosomes to create new combinations.
Gene Duplication
Duplication of a gene (or portion). One copy retains the original function, while the other evolves and acquires new functions.
Retrotransposition
Incorporation of mRNA sequences back into DNA, often inserting into new locations with different expression patterns.
The mechanisms by which new genes/proteins arise enable sequence analysis to infer functional and structural relationships.
Evolutionary Selection
Natural Selection vs. Artificial Selection
Consider watching the YouTube video on this topic. Also review the Khan Academy article on Darwin, evolution, & natural selection.
Anagenesis
(Phyletic evolution): A single population transforms enough to be designated a new species.
Cladogenesis
Branching evolution: A new species arises from a small population that buds from a parent species. Most new species probably evolve by cladogenesis, the branching evolution that is the basis for biological diversity.
Directional, Disruptive, and Stabilizing Selection
- Directional Selection: Favors individuals at one end of the phenotypic range.
- Disruptive Selection: Favors individuals at both extremes of the phenotypic range.
- Stabilizing Selection: Favors intermediate variants and acts against extreme phenotypes.
Directional Selection Example
The shift in moth population color from light to dark during the Industrial Revolution in England is an example of directional selection.
Disruptive Selection Example
Yearling male lazuli buntings with either bright or dull coloration being able to establish territories and breed, while those with intermediate plumage do not mate.
Stabilizing Selection Example
Stabilizing selection favors the most common phenotype as best adapted. It reduces variation by selecting against alleles that produce more extreme phenotypes. An example is birth weight; babies with weights too low or too high face increased risks.
Woodpeckers and wasps influence gall-fly populations, applying pressure that results in stabilizing selection.
Artificial Selection
Examples include the domestication of wolves into various dog breeds and the selective breeding of wild mustard into cabbage, brussel sprouts, cauliflower, broccoli, kale, and kohlrabi.
Lamarck's Giraffe vs Natural Selection
Contrast Lamarck's inheritance of acquired characteristics with natural selection, using the example of giraffe neck length. Also, consider the evolution of modern corn from Teosinte.
Adaptive Evolution
When natural selection favors a single allele and the allele frequency continuously shifts in one direction.
The Human Genome and Positive Selection
Human Genome Build 38
- ~3 billion nucleotides or basepairs, ~3 million vary among random 2 humans
- ~25,000 genes
- only about 2 % of genome encodes for proteins
Chimpanzee Genome
- Human and chimps diverged 5-6 million years ago (mya).
- ~99% identical overall to the human genome.
- ~30,000,000 nucleotide differences.
- 29% of genes identical to human homologue (6,250 genes).
- Average divergence per gene: 2 amino acid difference; one per lineage since human/chimp divergence.
Genome Wide McDonald-Kreitman Test
Red bars on the selection map indicate loci under negative selection. Blue bars represent loci under positive selection (95% credibility level). Strong evidence of selection at >99%.
Using Genetic Variation to Understand Natural Selection
To understand the presence and form of natural selection on genes by:
- Inferring ancestral states for genes.
- Inferring selection on amino acids in proteins with important functions and relating gene selection to phenotype selection.
- Inferring recent selective sweeps or balancing selection in the human genome.
Inferring Lineage Specific Evolution
Compare a gene of interest for different species
Measuring Positive Selection
Rate of synonymous mutations and rate of non-synonymous mutations
Changes in Protein Sequence
Changes in a protein sequence come from changes in the nucleotide sequence
Genetic Code and Synonymous/Nonsynonymous Changes
- Synonymous Change: Does not change the amino acid encoded (e.g., TCT -> TCC, both coding for Serine).
- Nonsynonymous Change: Changes the amino acid encoded (e.g., TCT -> TTC, changing from Serine to Phenylalanine).
Nonsynonymous changes are more likely to have functional consequences and are generally deleterious, thus removed from populations more rapidly. The rate of nonsynonymous change will be slower than the rate of synonymous change.
Inferring Adaptive Amino Acid Change in Proteins
Measuring selection on protein-coding genes:
-->Selection ‘for’ particular amino acid changes.
Changes are synonymous or non-synonymous
AAA \rightarrow AAG \text{ (Lysine)}
AAA \rightarrow GAA \text{ (Glutamic Acid)}
Synonymous or Nonsynonymous Change: dN/dS
- dS: rate of synonymous change (e.g., per gene). Because synonymous changes do not affect the protein, most have little or no effect on organism fitness.
- They are selectively neutral and accumulate at a constant rate (clock-like).
- If species are far apart, correct for ‘multiple hits’ using a statistical model of sequence change.
- dN: rate of nonsynonymous change (e.g., per gene). Nonsynonymous changes affect the protein, most are deleterious and lost.
- So, dN rate is generally slower than dS. Hence dN/dS is generally less than 1.
- If dN > dS, there have been many nonsynonymous changes, which is rare and a signature of adaptive evolution.
*What if dN/dS = 1?
Quantifying Non-Synonymous Variation
Estimate of positive selection
- Synonymous mutations: neutral mutations.
- Non-synonymous mutations: non-neutral mutations.
Codon Usage
Frequencies of different codons for the same amino acid are different.
Codon usage bias is caused by:
- Translation machinery tends to use abundant tRNA (and codons corresponding).
- Codon usage bias is the same for all highly expressed genes in the same organism.
- Mutation pressure: Difference between mutation rates between GC à AT and AT à GC.
- GC-content is different in different organisms.
The genetic code is redundant.
*Some amino acids coded by more than one codon.
*Proteins are more conserved during evolution.
DNA \rightarrow 4 \text{ letter alphabet}
Proteins \rightarrow 20 \text{ letter alphabet}
- Two random DNA sequences 25% identical on average.
- Two random protein sequences 5% identical on average.
dN/dS Interpretation
- dN/ dS < 1 : replacements are deleterious (very few changes in amino acids, along lineage)
- dN/ dS = 1 : replacements are neutral (changes just happen randomly)
- dN/ dS > 1 : replacements are advantageous (lots of changes in amino acids along lineage)
- Ratio of non-synonymous to synonymous changes=dN/ dS=Ka/Ks
dN/dS, Ka/Ks
Nei and Gojobori, 1986
- Nd = Counts of non-synonymous mutations for each gene
- Sd = Counts of synonymous mutations for each gene
- N = Counts of potential non-synonymous sites for each gene
- S = Counts of potential synonymous sites for each gene
KA = Nd / N and KS = Sd / S
Ratio KA/KS as an indicator of evolutionary mode in each gene Basic analyses of the proportion of non-synonymous to synonymous divergence KA/KS
Purifying, Neutral, and Positive Selection
- dN/dS < 1: Purifying Selection
- dN/dS = 1: Neutral Evolution
- dN/dS > 1: Positive Selection
KA or dN: rate of non-synonymous divergence and KS or dS: rate of synonymous divergence between species
Estimating Non-Synonymous and Synonymous Polymorphisms
Estimates of non-synonymous and synonymous polymorphisms and substitutions provide insight into evolutionary processes By Analysing divergence and polymorphism:
- KA / KS ratios > 1 indicate positive selection
- KA / KS ratios < 1 indicate negative selection
- KA / KS ratios = 1 indicates neutral evolution
KA and dN: rate of non-synonymous substitutions , KS and dS: rate of synonymous substitutions, PN: Amount of non-synonymous polymorphisms, PS: Amount of synonymous polymorphisms KA/Ks branch-specific estimate
Analogy Between Phenotype-Level and Genetic-Level Selection
- Selection ‘for’ change in one direction
- Directional selection on phenotype: Ala->Glu, Tyr->Ser (examples)
- Positive selection on a gene:
- Selection ‘for’ remaining the same
- Stabilizing selection on phenotype
- Purifying selection on a gene - Ala, Tyr, retained despite mutations to other amino acids
Positive selection is selection on a particular trait - and the increased frequency of an allele in a population
Excess of Function-Altering Mutations Example
In PRM1 exon 2, there are six differences between humans and chimpanzees, five of which alter amino acids.
Branch-Specific dN/dS Estimates Example
branch-specific dN/dS estimates for OGP (oviductal glycoprotein) for multiple species
Selective Sweeps and Balancing Selection
Alleles and Haplotypes that increase in frequency rapidly due to positive selection will carry lots of “hitch-hiking”, flanking DNA, creating a linkage disequilibrium signature
Infer Selection
Geographic variation in allele frequencies and patterns
Examples of Genes with Geographic Selection
Genes such as AGT, CYP3A, SLC24A5, FY, IL4, IL13, CASP12, NAT2, LCT, TRPV6, and MMP3, show evidence of geographically restricted selection in humans related to climate, pathogens, or diet.
Extreme Population Differences
Extreme population differences in FYO allele frequency; The FYO allele, which confers resistance to P. vivax malaria.
Studies of Selection Signatures in the Human Genome
Brains, food, reproduction and parasites
Genome-Wide Analysis
Most-significant categories showing positive selection in the human lineage include:
- Immune system: parasites and pathogens
- Reproduction: genes expressed in reproductive tissues
- Nervous system genes: expressed in brain
- Amino-acid metabolism: diet
- Olfaction: sense of smell
- Development: such as skeletal
- Hearing: for speech perception
Biological Processes with Significant Excess of Positively Selected Genes
Immunity and defense, T-cell-mediated immunity, Chemosensory perception, Biological process unclassified, Olfaction, Gametogenesis, Natural killer-cell-mediated immunity, Spermatogenesis and motility, Inhibition of apoptosis, Interferon-mediated immunity, Sensory perception and B-cell- and antibody-mediated immunity.
Enrichment of GO Categories Example
Chemosensory perception, Olfaction, Gametogenesis, Spermatogenesis and motility, Fertilization, Other carbohydrate metabolism, Electron transport, Chromatin packaging/remodeling, MHC-I-mediated immunity, Steroid metabolism, Lipid and fatty acid binding, mRNA transcription initiation , Protein modification, Vitamin/cofactor transport, Phosphate metabolism and Peroxisome transport
Adaptive Evolution of Young Gene Duplicates
Human-specific duplicates evolving under adaptive natural selection include a surprising number of genes involved in neuronal and cognitive functions.
Hominid-Specific Gene Families Under Positive Selection Examples
Neuroblastoma breakpoint family NBPF and others such as, FAM75A ,Williams Beuren syndrome region 19 and etc.
Specific Genes Affecting Brain Size: Microcephaly Genes
Small (~430 cc v ~1,400 cc) but otherwise ~normal brain, only mild mental retardation. Can be due to loss of activity of the ASPM gene, Abnormal spindle-like microcephaly associated, or MCPH1 gene
Positive Selection of MCPH1 in Primate Evolution
Positive Selection of ASPM in Primate Evolution
ASPM is still evolving adaptively in human lineage?! and May related to forms of human language, tonal and non-tonal convergence
Genes related to Brain and Language Example: FOXP2
relating human adaptive molecular evolution to human disease, Crespi 2010, Evol. Appl. Genes subject to recent positive selection in humans are differentially involved in neurological diseases
Inheritance of a Language/Speech Defect
FOXP2
- FOXP2 mutations results in an autosomal dominant communication disorder
- Phenotype includes problems with speech articulation and deficits in many aspects of language and grammar
- Intelligence varies among affected individuals but speech/language impairment is always present
- Interestingly, deficits with language are not restricted to speech but influence writing and comprehension/expression Chromosome 7 7q31
FOXP2 is highly conserved throughout mammals and beyond but for three nucleotide substitutions that change the FOXP2 protein between humans and the mouse, and two have occurred along the human lineage Examination of human genetic variation suggests that the region surrounding the gene underwent a selective sweep in the past 200,000 years
Brains of individuals with FOXP2 mutations have reduced grey matter in the frontal gyrus which includes Broca’s area and Functional abnormalities in Broca’s area during language tasks
Positive selection linked to Neandertals
The Derived FOXP2 Variant of Modern Humans Was Shared with Neandertals
FOXP2: two genetic variants (SNPs) are associated with risk of some neurodevelopmental disorders involving speech and language, schizophrenia and autism
Positive Selection Related to the Human Genome
Many genes related to primate brain development have been subject to positive selection Have identified several positively-selected genes related to brain size and language in humans, but we do not know how they work These same genes are also involved in human disorders related to the brain and language
The Gene Example LCT
All infants have high lactase enzyme activity to digest the sugar lactose in milk In most humans, activity declines after weaning, but in some it persists: LCT*P
Molecular Basis of Lactase Persistence
Linkage and LD studies show association of lactase persistence with the T allele of a T/C polymorphism 14 kb upstream of the lactase gene and Lactase level is controlled by a cis-acting element
Genetic Signatures-Lactase Gene
Genetic Signatures of Strong Recent Positive Selection at the Lactase Gene and Convergent adaptation of human lactase persistence in Africa and Europe show selection
A SNP Examples
A SNP in the gene encoding lactase (LCT) (C/T-13910) is associated with the ability to digest milk as adults (lactase persistence) in Europeans, These data provide a marked example of convergent evolution due to strong selective pressure resulting from shared cultural traits-animal domestication and adult milk consumption and Genotyping across a 3-Mb region demonstrated haplotype homozygosity extending >2.0 Mb on chromosomes carrying C-14010, consistent with a selective sweep over the past ~7,000 years
Dietary Adaptation
Diet and Adaptation Example of Use of Starches
Not just milk - use of starches also increased in human diet, Diet and the evolution of human amylase gene copy number variation and Evidence for selection of suite of genes 'for' meat-eating adaptation
Human Adaptation to Diet
Better food, smaller guts, adaptations to meat, Humans adapted genetically to a novel diet that includes dairy products, grains, and more meat The selection involved has been strong The molecular adaptations involved in dietary adaptations tend to be local geographically, and still exhibit genetic polymorphisms
Mammals Rapid Evolution of Reproductive Proteins
Rapid Evolution of Reproductive Proteins in Mammals, Rapid Evolution of Fertilization Proteins and Male-female conflicts, Egg laying (increased). Receptivity to mating sperm displacement and more
Correlation Between Evolution and Selection
Correlation between SEMG2 Evolution and Primate Sexual Traits, comparative evidence for molecular adaptation related to sperm mobility, with implications for human male fertility different primate species and Carlson et al, research in fertile
Strong balancing selection at HLA loci
Evidence from segregation in South Amerindian families: A strong signature of balancing selection in the 5' cis-regulatory region of CCR5
Maladaptation-Byproducts and Local Adaptations
How selection on the human genome is related to disease; Strong, recent positive selection can create maladaptations as byproducts (via pleiotropy); Balancing selection creates maladapted homozygotes (as a form of tradeoff); Locally-selected adaptations become maladaptive with changes in the environment (such as recent human migrations)-local adaptation is common; Selection on brain, dietary, reproductive and disease genes has generated very rapid, recent, ongoing change, which helps in understanding human adaptation and disease
Immune recognition molecules: KIRS, LIRAS, HLAS, TRs Multiple changes, Inactivation or deletion, Duplication and Functional amino acid changes
Pkr Evolution in Response to Viral Mimicry - YouTube