Divergence Within Species

0.0(0)
studied byStudied by 1 person
0.0(0)
full-widthCall Kai
learnLearn
examPractice Test
spaced repetitionSpaced Repetition
heart puzzleMatch
flashcardsFlashcards
GameKnowt Play
Card Sorting

1/64

encourage image

There's no tags or description

Looks like no tags are added yet.

Study Analytics
Name
Mastery
Learn
Test
Matching
Spaced

No study sessions yet.

65 Terms

1
New cards

Analogy

The relationship of any two traits that arose convergently (from unrelated ancestors)

2
New cards

Homology

Similarity between any traits due to common ancestry (usually with divergence)

3
New cards

Base-by-base alignment comparison

Compare the DNA sequence between species (but where do you start?) specific but computationally heavy.

4
New cards

Protein/Gene-by-gene alignment

Align corresponding genes/proteins in different species and compare them (amino acid comparisons)

5
New cards

WGA Challenges

While homologous genes/proteins should align well, different species will have rearrangements that occurred during evolution (inversion, translocation, duplication, del, insertions)

6
New cards

Orthology

Relationship of traits that came from a common ancestor between the species’ sharing the traits.

7
New cards

Paralogy (Duplication)

Relationship of two homologous traits resulting from a duplication of the gene for that trait, can diverge over time

8
New cards

Xenology (HZ Transfer)

The relationship of any two homologous traits whose history (since their common ancestor) involves a horizontal gene transfer

9
New cards

How do we tell all these -logs apart?

phylogenetic analysis (precise but computationally heavy), other methods are more indirect

10
New cards

Types of Duplication

Gene duplication, whole genome duplication (create an entire new set of paralogs)

11
New cards

Duplications

Whole genome duplication (entire genome of paralogs), Segmental duplication (tandem in eukaryotes, not common in bacteria)

12
New cards

Horizontal Gene Transfers

  • Homologous (replaces a gene or segment with an orthologous copy from another organism),

  •  Non-homologous (introduces a gene or segment that is entirely new from another organism)

Was thought to have relatively little impact on evolution, UNTIL genome sequencing came along!

13
New cards

Subfunctionalization (Duplication)

The function of the ancestral gene is split (i.e. one binds to X and the other binds to Y)

14
New cards

Neofunctionalization (Duplication)

One copy acquires a whole new function

15
New cards

Pan-genome

All the distinct genes found in the population/species/unit as a whole

16
New cards

Genealogy

The ancestral relationships in a collection of gene sequences

  • Can vary between chromosome segments

17
New cards

SNP

The most pervasive form of variation within a species, it is a single nucleotide that differs from most other ppls

18
New cards

How do you answer “how much variation is there in the population?”

Average number of pairwise differences:

1) Sequence genomes from a pop

2) Choose a window in the genome where sequences can be aligned

3) Count DIFFERENCES between pairs of sequences

4) Take the average number of differences over ALL pairs of sequences

5) Divide by the number of sites sequenced

19
New cards

Average Number of Pairwise Diff (πlocus) PER LOCUS

Sum of pairwise differences across all pairs / number of pairs

134 total differences across all pairs / 45 pairs [aka all possible combinations of pairs!] = 134/45 = 2.98 = πlocus

20
New cards

Average Number of Pairwise Diff (πsite) PER SITE

Average number of pairwise diff per locus (πlocus) / total number of sites[bases] compared = 2.98/29 = 0.1ish = πsite

21
New cards

How do you answer “How common are SNPs in the population?”

SFS (Site frequency spectrum) for polymorphisms:
1) Sequence genomes from pop sample

2) Choose a window in the genome where sequences can be aligned

3) Identify each SNP and determine its frequency

4) Bin SNPs by FREQUENCY

5) Plot histogram showing number of SNPs per FREQUENCY BIN

22
New cards

SFS Example

Locus (window) = 1

n = 10 distinct sequences

L = 29 sites/bases sequenced and analyzed

9 different SNP sites in this set 

23
New cards

Each SFS has a frequency between…

1/n and n-(1/n), where n is the number if sequences or sample size

24
New cards

SFS Shape depends on…

population history (pop up, pop down, natural selection)

25
New cards

Estimating Sequence Divergence

1) Sequence genomes from two or more species

2) Choose in the genome where sequences align

3) Count differences

4) Adjust the estimate to account for unobserved substitutions

5) Repeat for each window across the genome

26
New cards

What is an unobserved substitutions?

When a base mutates from A → G → back to A

27
New cards

Estimate interspecies divergence ASSUMPTION:

Each variant represents a single mutation

(May break down for frog/chicken [because very very old RCA] because of unobserved substitutions)

28
New cards

Fidelity

Perfect fidelity: # of canon substitutions = # of observed sequence differences

Bad Fidelity: # of actual substitutions is unequal/greater than observed substitutions

29
New cards

Sequence difference vs time graph expectation vs reality

Expected difference is perfectly linear, but observed difference flattens at the end. Difference between the expected linear & observed asymptote is the [underestimation of # substitutions]

<p>Expected difference is perfectly linear, but observed difference flattens at the end. Difference between the expected linear &amp; observed asymptote is the [underestimation of # substitutions]</p>
30
New cards

Jukes Cantor Correction

K = -3/4*ln(1-(4/3p)) … where K is the actual proportion of sites that changed, and p is the estimated proportion of sites that changed

31
New cards

What is p in jukes cantor

p = # obs difference / sites aka bases compared

32
New cards

We can estimate substitution rates for different site classes

  • Transition vs transversion

  • Coding vs non-coding

  • Amino acid vs silent

33
New cards

We can estimate substitution rates of insertion and deletion

  • Single base pairs

  • STRs/Microsatellites

  • Transposable elements

34
New cards

Tree of Life

With more genome sequences from increasing numbers of species, it’s now possible to measure substitution rates along branches of the TREE OF LIFE

35
New cards

Are individuals from the same species identical at most sites in the genome?

Yes, most sites will have the same base

36
New cards

Within species variation (average number of pairwise differences per site) of drosophila, arabidopsis, elegans, and humans

drosophila 0.01

arabidopsis 0.008

elegans 0.003

humans 0.001 (even adjusted for genome length, we have lower variation)

37
New cards

Differences among species in within-species variation are smaller than expected

Pairwise diversity varies over 3 orders of magnitude, approximate pop size varies over 10 orders

38
New cards

SFS varies among populations

Thus us because diff pops have diff histories (finland, pakistan, etc). Most variant sequences are lower frequency

39
New cards

What is the trend of rare variants in SFS (covered in Human Genetics)

amt of SNPs found in 1/22 genomes >>> amt of SNPs found in 21/22 genomes.

Because super rare RECENT variants pop up all the time whereas it takes generations for variants to become more common

<p>amt of SNPs found in 1/22 genomes &gt;&gt;&gt; amt of SNPs found in 21/22 genomes.</p><p>Because super rare RECENT variants pop up all the time whereas it takes generations for variants to become more common</p>
40
New cards

SFS varies among site classes

Synonymous vs non-synonymous non-del vs non-synonymous del…

Ex: In maize, non-synonymous del mutations are a higher percentage of SNPs in the 0.001% bin

<p>Synonymous vs non-synonymous non-del vs non-synonymous del…</p><p></p><p>Ex: In maize, <strong>non-synonymous del </strong>mutations are a higher percentage of SNPs in the 0.001% bin</p>
41
New cards

Trend: Sequence divergence increases with time

True (bc of mutation accumulation) & MORE CLOSELY RELATED species have less sequence divergence (common sense)

42
New cards

Substitution rates vary among different lineages (even linages of similar species [EX: Enterococcus species])

True

43
New cards

How do SNPs/Variation arise?

Point mutations

44
New cards

How do you measure how often point mutations happen?

Mutation-accumulation experiments (Maintain inbred lines by selfing or sibling mating to keep population size small [that way the effects of selection are minimized])

45
New cards

Mutation-accumulation steps?

1) Conduct inbred line experiment

2) Sequence genomes

3) Count mutations

4) Divide the count by number of generations to estimate the mutation rate

46
New cards

Nuclear mutation rates in eukaryotes and prokaryotes are? [IMPORTANT BC HE REPEATED IT IN LATER LECTURE]

10^-10 to 10^-8 per site per generation

(If you randomly choose a base, this is the chance that it will mutate after 1 round of reproduction)

47
New cards

Nuclear mutation rate trends?

  • Nuclear mutation rates can be orders of magnitudes higher in VIRUSES

  • Transitions occur 2x more than transversions

48
New cards

KA, KS, KA/KS

KA = non-syn mut rate

KS = syn mut rate

KA/KS is their ratio

49
New cards

Genetic Drift

The random effect of biologically random sampling on sequence variation

(Drifting frequency of an allele relative to others in a population as a result of chance/stochastic events)

50
New cards

Basic principles of genetic drift?

  • Direction of allele frequency change is RANDOM (can be up or down)

  • After enough time, the allele is either LOST or FIXED

  • Drift reduces sequence variation (homogenizes population)

  • Variation is lost faster in smaller populations

51
New cards

Mutation & Drift Feedback Loop

Mutation adds variation, drift removes variation

52
New cards

How do you measure the expected level of sequence variation within a species?

Average number of pairwise differences (π)

Formula: π = 4*Ne*μ

Where Ne is the effective population, and μ is the mutation rate per locus per site (think MIUtation rate to remember)

[assume autosomal seq, diploid, no selection]

53
New cards

Calculate Expected Sequence Divergence?

Exp Seq Div = 2*μ*t

  • Find MRCA & two branches (two offshoot species)

  • Mutation rate is μ & multiply 2 (TWO BRANCHES) by number of generations since MRCA (t)

  • Assumed no selection, have to account for multiple substitutions if t is large

54
New cards

The neutral theory of molecular evolution states…

  • Most mutations that effect fitness are deleterious & removed by selection

  • Most sequence variation within species/divergence between species us NEUTRAL (due to drift, not selection)

  • THIS SERVES AS THE NULL HYPOTHESIS FOR EXPERIMENTS MEASURING SELECTION

55
New cards

Purifying and Positive Selection

  • Mutation produces heritable sequence variation in a trait affecting survival/reproduction

  • Purifying: Removes deleterious variants from WITHIN species, LIMITS SEQUENCE DIVERGENCE BETWEEN SPECIES

  • Positive: Fixed beneficial variants within species, INCREASES SEQUENCE DIVERGENCE BETWEEN SPECIES

56
New cards

Most proteins experience purifying selection, why?

In most species, the synonymous substitution rate in proteins >>>> the non-synonymous. Proteins are easy to mess up & hard to enhance via mutation. So oftentimes, mutations in proteins are deleterious and experience purifying selection.

57
New cards

Synonymous divergence predicts what non-synonymous divergence SHOULD be under neutrality (aka no selection)

This makes sense because synonymous mutations tend not to affect phenotype at all, so they’re subject to no selection. So in a hypothetical system with no selection, non-synonymous would behave similarly

58
New cards

Selection sweep

A beneficial mutation might arise in an individual, then experience positive selection → fixation. This causes a localized reduction in the populations nucleotide diversity in that section of the chromosome

It brings neutral variants along with it (hitchhikers)

59
New cards

Trend on graph for fixation/positive selection sweeps

  • Valley of π (avg # of pairwise differences) along chromosome in entire population.

OR

  • Spikes of FST tell you fixation has occurred

So it locally decreases variation IN a population, but increases sequence variation between populations.

60
New cards

Kachroo et al. Paper 2015: Do orthologs have conserved function?

Conclusion: Replaced yeast genes with human orthologs, 47% of essential yeast genes could be successfully humanized when replaced with their human orthologs. Suggests some retention of function between orthologs despite long history of evolutionary divergence.

61
New cards

Rogozin et al. 2014 - Is ortholog function more conserved than paralogs?

Conclusion: In general, when background noise is taken into account & Z scores are calculated, between-species orthologs express greater expression similarity than paralogs. This AGREES with the Ortholog Conjecture but acknowledges that OC is only one aspect of gene evolution. 

Orthologs >> Within-species paralogs > Between-species paralogs

62
New cards

Soria et al. Paper 2014: Do paralogs exhibit functional divergence?

Summary: There are significant levels of functional divergence of ohnolog (arose from gene duplication) pairs within phenotypic levels (protein, gene expression, organismal growth) but the functional divergence across these levels varied. NEARLY ALL OHNOLOG PAIRS SHOWED FD by at least one measure, aka displayed FD.

Methods: 499 S. Cerevisae ohnolog [subset of paralogs, arise from WGD] pairs present in 4 other post-WGD species in the S. clade were identified via synteny. Got protein sequence data, gene expression data, and organismal phenotype data of the 499 ohnologs in the 5 species to measure functional divergence. Identified type I (evolutionary rate-based) and type II (AA sequence-based) FD, and used quantitative values to calculate Pearson’s correlation coefficient which showed whether FD in one level can predict FD in another. Overall they found that FD had strong internal correlation (protein-level analyses all agree), but it had weak external correlation (protein-level analyses won’t agree with organismal level). But that nonetheless, paralogs/ohnologs DO display FD.

63
New cards

Zhang et. al: What determines the rate of protein evolution

Found that the evolutionary rate of a protein is predominantly influenced by it’s expression level rather than by functional importance.

  • Hypotheses include:

  • mRNA-folding requirement: Highly expressed genes may evolve slowly to preserve optimal mRNA secondary structures that enhance translational efficiency and stability

  • Protein misfolding avoidance: Proteins expressed at high levels are under strong selection to avoid misfolding, as misfolded proteins can be toxic and energetically costly.

  • Protein misinteraction avoidance: Abundant proteins must avoid unintended interactions with other cellular components, leading to evolutionary constraints that minimize indiscriminate binding.

  • Expression cost: The energetic and resource burden of producing proteins imposes selection against unnecessary or inefficient sequences, especially in highly expressed genes.

These all serve to explain the E-R Anticorrelation and why it exists

64
New cards

E-R Anticorrelation

The anticorrelation of expression and evolutionary rate that is seen across ALL DOMAINS OF LIFE and THROUGHOUT THE GENOME.

65
New cards

Xu et al: Determinants of genome sequence variation

Methods: Sequenced 68 globally distributed genotypes of Giant Duckweed to asses genome-wide genetic diversity. They ALSO conducted mutation accumulation (MA) experiments under indoor/outdoor conditions to directly estimate the spontaneous mutation rate. WGS and variant-calling were used to identify de novo mutations & POPULATION GENOMIC ANALYSES were performed to estimate Ne and LD.

Conclusion: Despite its insanely high Ne, Giant Duckweed shows extremely low genetic diversity, which the authors attribute to a exceptionally low mutation rate (7x lower than other eukaryotes). This supports the drift-barrier hypothesis in which large Ne enables natural selection to reduce mutation rates. Mutation rate, not just Ne, shapes genome-wide variation