1/19
Looks like no tags are added yet.
Name | Mastery | Learn | Test | Matching | Spaced | Call with Kai |
|---|
No study sessions yet.
Two ways that data is related to trees
Distance based approaches
Character based approaches
Both rely on optimality criterion
Distance Based Approached
convert sequence data into a numerical measure of evolutionary measure
construct a distance matrix
use this matrix to build a tree
Character Based approaches
use multiple sequence alignment directly
evaluate each site (character)
Evolutionary Distance
a numerical estimate of evolutionary change
increases with dissimilarity
often correlates with time since divergence
represented as branch lengths or patristic distanced
Pairwise
an estimate
p-distance
proportion of sites at which two sequences differ
Features:
normalised per site
based on only observed differences in extant sequences
Multiple Hits
more than one substitution can occur at the same site
can occur in one lineage and both lineages
substitutions may be superimpoed
Consequences of multiple Hits
underestimation of evolutionary distance
incorrect rate estimates
increased homoplasy
Homoplasy
fixation of identical by state alleles in different lineages with independent mutational origins
can mislead phylogenetic inference by grouping taxa based on similarity rather than ancestry
When does saturation occur
most sites have undergone one or more substitutions
additional substitutions are no longer detectable
Effects of saturation
sequences appear randomly scrambled
alignment becomes unreliable or impossible
correction for multiple hits becomes infeasible
phylogenetic signal is lost
Procedure for Distance Matrix
Calculate pairwise distances between all sequences
Construct a tree from the distance matrix
What does this mean if distances are additive?
each distance equals the sum of branch lengths connecting taxa
the matrix perfectly summarises patristic distances
How do we address correcting multiple hits?
estimate the number of unobserved substitutions
attempt to recover the true revolutionary distance
Why is correction uncertain?
we only observe end points
we lack direct knowledge of intermediate events
Assumptions of Jukes-Cantor Model
four nucleotides occur at equal frequency
all substitution equally likely
constant rate over time
only substitutions considered (no indels)
How is the JC69 model constrained?
each row sums to zero
total number of character states remains constant
When is JC model good?
when sequences are highly similar
few substitutions have occurred
5 main approaches to phylogenetic inheritance
Distance methods
Maximum parsimony
Maximum likelihood
Bayesian inference
Hybrid appraoches
UPGMA
assumes a molecular clock
almost always inappropiate
explicitly discouraged
Neighbour Joining
does not assume equal rates
efficient and widely used
uses distance matrix to minimise total tree length