Alternative copies of a gene exist side by side within populations, and they have a lineage that traces their history back through time
can be morphological (like the presence or absence of feathers) or molecular (presence of adenine at a certain position of a certain gene) characters
Gene trees reconstruct the historical relationships among alleles within and between populations
BRCA1 gene
tumor-suppressor gene
mutations can increase a woman’s risk of developing breast or ovarian cancer
only takes one single mutation to a single nucleotide to create the risk
may have arisen in the egg or sperm that created the zygote or it could’ve been inherited from one of the parents
if the mutated variant was in the egg and the sperm carried the unmutated variant, then the zygote carries one benign and one pathogenic variant.
if the zygote develops into a woman, she would be at risk of breast or ovarian cancer
approximately half of her kids would inherit the mutated variant, who then might pass it on to their kids
some alleles get replicated while others fail to be transmitted
synapomorphy because individuals with similar when they inherit a state from a common ancestor
Gene tree is the branched genealogical lineage of homologous alleles that traces their evolution back to an ancestral allele
Coalescence tells us if the population has experienced any major changes in size or if it has undergone natural selection
to examine the events, read the trees in the reverse direction, beginning with the tips
Nodes are splitting events
the points where two lineages converge or coalesce into a single ancestral lineage
coalescence events occur at the most recent common ancestor of any two alleles
Positive selection can accelerate the rise in the frequency in an allele
shortening time to fixation → short coalescence
It is possible to trace the genealogies of genes back through time, reconstructing when mutations generated new alleles and how these alleles subsequently spread
Incomplete lineage sorting - because alternative alleles persist side by side for a very long time, they may be passed down to daughter species in a fashion that does not reflect the actual branching history of the species
Initially, a population will have many alleles of a gene
when the population splits, several alleles might be carried together into both of the resulting species
if the lineages split again, some of the alleles will be carried once more
eventually, some alleles will be lost due to drift
so if you take the sample of alleles from some daughter species, they may be different from the original ancestral species
might not reflect the actual branching history of species
Normally, we expect species A and B to be more closely related than species C if A and B share a recent common ancestor.
But if an ancestral population had multiple alleles, some alleles in A and C may be more similar to each other than to those in B.
This creates a gene tree that conflicts with the expected species tree.
Paralogs are homologous genes arising from gene duplication.
form a gene family
within the same species
Introgression - occasionally, gene copies from one species will be introduced into the genome of a second species
genes through hybridization
if the genes carry beneficial variation, they may be favored by selection and retained within the genomes of the recipient species
If we construct a gene tree using that introgressed gene, it may show that Species A and B are more closely related than they actually are (since they share the gene).
However, if we look at the species tree based on multiple genes, we may see that A and B are not true sister species.
Both incomplete lineage sorting and introgression result in gene trees that differ from true phylogeny of the species.
Studying some genes between humans, chimpanzees, gorillas, and organutans found that humans and chimpanzees are closely related. However, studying other genes pointed to gorillas as our closest relatives
Phylogenetic trees are hypotheses about the relationships among species or groups of individuals
Analytical approaches to select the phylogeny that best approximates the actual history of a group
maximum parsimony: the simplest solution is the most reasonable one
the tree with the fewest number of character state changes
can be misleading when homoplasy is considered
to reduce the effects, scientists focus on informative portions of genomes (exons)
introns and intervene regions have more variable sequences but are not homoplasy due to random convergence in base pair
distance-matrix methods: closely related species will have more similarities than distantly related species
convert DNA or protein sequences from different taxa into a pairwise matrix of the evolutionary distances (dissimilarities) between them
used to estimate the lengths of the branches in the tree by equating the genetic distance between nodes with the length of the branch
neighbor-joining method: scientists pair together the two least-distant species by joining their branches at a node and then join this node to the next closest sequence
maximum likelihood methods: requires a substitution model, which describes how DNA, RNA, or protein sequences change over time
For each tree, it calculates the likelihood (probability) of observing the given genetic data, given the chosen substitution model.
the better trees are those for which the data are most probable
Bayesian model: use statistical models to determine the probability of a tree given a particular data set
integrate over multiple possible trees, rather than selecting a single "best" tree like maximum likelihood does.
Bootstrapping:
select a random sample of characters from their full data set
create a new data matrix and use it to generate a potential phylogeny
repeat the process, randomly selecting characters and creating a potential phylogeny
after doing this many times, compare phylogenies
if the trees are very different, it means the data is poor support for the original tree
if the trees are similar, it indicates stronger support
Purifying selection: removes deleterious alleles from a population
negative selection
Two hypotheses for how Homo sapiens evolved:
multiregional model: evolved gradually across the entire Old World from an older hominin species over the past 1 million years
out-of-Africa model: all major ethnic groups of humans are derived from recent African ancestry
earliest fossils are found in Africa
Analysis of DNA from Africans and compared to people from other parts of the world
identified nuclear microsatellites, sections of repeated DNA that have a very high mutation rate
used the neighbor-joining method, and constructed a tree that revealed where most human genetic diversity can be found — in Africa
all non-Africans form a monophyletic group suggesting that they diversified after migrating out of Africa
Lentiviruses infect mammals by invading certain types of white blood cells
SIV infects monkeys and apes, close to HIV
HIV is not a monophyletic group as different strains have different origins
Neutral mutations accumulate in a clocklike fashion in genomes
scientists can use molecular clocks to estimate the origin of diseases and major clades
Neutral mutations can spread to fixation due solely to processes such as genetic drift
Non-coding DNA (including pseudogenes) has no function
mutations to these sequences are not likely to affect the phenotypes of the individuals that carry them, so they are not likely to be exposed to selection
Protein-coding genes could also escape selection
synonymous (silent) substitution: several codons may encode the same amino acid
does not mean they are completely immune to selection’s effects
may affect how efficiently a particular protein is translated even if it does not alter the resulting structure of the protein
nonsynonymous substitution: replaces one amino acid with another
a mutation that does change an amino acid in a protein may still fail to change the function of the protein
Motoo Kimura:
although natural selection could change phenotypic adaptations, much of the variation in genomes was the result of drift
predicted: neutral mutations would become fixed in populations at a roughly regular rate
the more time that passed after the lineages diverged, the more different mutations would be fixed in each one
cytochrome c: the more distantly related two species were, the more mutations had accumulated in each lineage since they split from a common ancestor
by counting the number of baser pair substitutions in a species’ cytochrome c, it is possible to estimate how long ago its ancestor’s branched off from our own
Molecular clock:
since most mutations in non-coding regions (or synonymous mutations in coding regions) are neutral, their rate of accumulation is proportional to time
Neutral Theory of Molecular Evolution
describes the pattern of nucleotide sequence evolution under the forces of mutation and random genetic drift in the absence of selection
predicts that neutral mutations will yield nucleotide substitutions in a population at a rate equivalent to the rate of mutation, regardless of the size of the population
as long as mutation rates remain fairly constant through time, neutral variation should accumulate at a steady rate, generating a molecular signature that can be used to date events in the distant past
Positive selection and purifying selection both leave distinctive signatures in nucleotide or amino acid sequences that can be detected using statistical tests
When a neutral mutation arises in a large population, it may take a very long time for it to reach a high frequency through drift
When an allele experiences strong natural selection, it can spread quickly through a population
selective sweep: when strong selection can “sweep” a favorable allele to fixation within a population, resulting in little opportunity for recombination
genetic hitchhiking: alleles that sit on the same chromosome when the mutation occurred get pulled along for the ride
so as the mutation becomes more common, so do these alleles
Linkage Disequilibrium: Digest Milk as Adults
many people stop producing lactase when they stop drinking milk
natural selection should favor this as it means mammals don’t waste energy on making an enzyme with no advantage
30% of people still produce lactase, so they can still consume milk and dairy products as adults
mutations gave rise to alleles conferring lactose tolerance in adults
Gene flow among populations works to homogenize their allelic populations. While drift and selection act within populations to diverge.
FST measures the extent of subdivision among populations
ranges from 0 (fully homogenized) to 1(fully segregated)
originally used to measure gene flow between populations, now used to measure how natural selection acts on populations
FST outlier method:
used to detect loci (specific regions of the genome) that show unexpected genetic divergence, which may indicate selection acting on it
Tibetan Plateau:
partial pressure of oxygen in airdrops as elevation increases
two strong outliers located next to EPAS1 and EGLN1 genes, known to affect oxygen physiology
Under purifying selection, harmful nonsynonymous mutations are purged from the population, so they accumulate more slowly than synonymous mutations. This leads to a lower dN compared to dS, resulting in a dN/dS ratio of less than 1.
nonsynonymous mutations are under negative (purifying) selection because they are being removed from the population faster than synonymous mutations, which do not affect fitness
When positive selection is acting, beneficial nonsynonymous mutations spread more quickly, leading to an increase in the rate of nonsynonymous mutations compared to synonymous mutations. This results in a dN/dS ratio greater than 1
When dN/dS > 1, it suggests that positive selection is acting on the gene, favoring the spread of beneficial nonsynonymous mutations, which accumulate faster than the synonymous mutations
because the mutation is beneficial, its frequency increases at faster rate
When dN/dS = 1, it suggests neutral evolution. This is because neutral mutations (those that do not affect fitness) accumulate in the genome at equal rates for both nonsynonymous and synonymous mutations, resulting in a dN/dS ratio of 1
null hypothesis
BRCA1
gene associated with breast cancer
when it is not cancer-causing, it is associated with several vital functions
including overseeing repairs to damaged DNA
researchers compared orthologs in many species
on some branches, dN/dS < 1
negative selection eliminated nonsynonymous mutations that disrupted the gene’s function
few branches, dN/dS > 1
positive selection
humans have 22 nonsynonymous substitutions to 3 synonymous substitutions
many of these changes result in breast cancer risk
speculate that when cells divide and make new copies of their DNA, some viruses slip their own genetic material into our cellular machinery and make copies of themselves
mutations allow the genes to shut viruses out, but viruses may evolve new adaptions to evade the gene
The size of the bacterial genome is proportional to the number of genes in each species
increase genomes by gaining new genes
an accidental duplication can create an extra copy of genes
or horizontal gene transfer can give bacteria new genes
deletions can cause them to shrink