Phylogenetics Summary
Key Concepts
Phylogeny:
Originates from Greek words: (tribe, race, taxon) and (origin, emergence).
First used by Häckel in 1866 to describe the evolutionary history of a group of organisms.
Represents branching relationships between species, illustrating the evolutionary connections and divergences.
Addresses the question: "Which species shares a most recent common ancestor with this species?" This helps in understanding the relationships and ancestry among different species.
Trees:
Represent evolutionary history as a branching diagram; Charles Darwin included a tree figure in "On the Origin of Species" (1859) to illustrate common descent, fundamentally changing how we view species relationships.
Tree Terminology:
Clade: A monophyletic group consisting of an ancestor and all its descendants.
Internal nodes: Represent hypothetical ancestors, indicating points of divergence.
Terminal nodes: Operational taxonomic units (OTUs), such as species or genes being studied.
Root: Represents the common ancestor of all taxa in the tree.
Branch (internal & external): Represents the evolutionary lineage connecting nodes; branch length may indicate the amount of evolutionary change.
Types of Phylogenetic Trees:
Phylogram: Branch lengths quantify the amount of change, indicating the extent of genetic or morphological divergence.
Dendrogram: Shows relative ancestry without quantifying the amount of change.
Trees can be seen as mobiles, depicting the same genealogy even when arranged differently, emphasizing that tree topology is key, not the specific arrangement.
Rooted vs. Unrooted Trees:
Rooted: Includes a root, indicating an evolutionary time axis, allowing for interpretation of the direction of evolutionary change.
Unrooted: Depicts similarity without a time axis, showing relationships but not evolutionary paths.
Polytomies:
Represent unresolved nodes indicating uncertainty or simultaneous divergence, often due to rapid speciation or insufficient data.
Phylogeneticists aim for dichotomal (bifurcating) relationships to clearly resolve evolutionary pathways.
Character Change:
Apomorphy: A derived trait unique to a terminal group.
Plesiomorphy: An ancestral trait.
Autapomorphy: A unique derived trait found in only one taxon.
Synapomorphy: A shared derived trait that indicates phylogenetic relationships.
Homoplasy:
Parallel evolution: Same feature arising from the same ancestral condition in independent lineages.
Convergent evolution: Same feature arising from different ancestral conditions.
Secondary loss: Reversion to ancestral condition.
Taxonomical Classification:
Monophyletic group: Includes ALL descendants of the most recent common ancestor, forming a complete clade.
Non-monophyletic group: An artificial grouping that does not accurately reflect evolutionary history.
Paraphyletic taxa: Contains the most recent common ancestor, but does not include all of the descendants of that ancestor, leading to an incomplete representation.
Polyphyletic taxa: Do not contain the most recent common ancestor of all the members, grouping taxa based on convergent traits rather than shared ancestry.
Inferring Molecular Phylogenies:
Convergence (homoplasy) in unrelated lineages can confound phylogenetic inference.
Problem with old branches: Accumulation of unique characters can obscure relationships.
Homoplasy:
Different taxa show identical characters due to homoplasy, not recent ancestry, leading to misleading relationships if not accounted for.
"Identical by state (IBS), not identical by descent (IBD)".
Quantifying Genetic Distance:
Compensating for homoplasy using models like Jukes Cantor (JC) or General Reversible (REV) to estimate true evolutionary distances.
Tree-Building Methods:
UPGMA: Unweighted Pair Group Method with Arithmetic Mean, a simple clustering method.
Neighbor-Joining (NJ): A fast distance-based method.
Maximum Parsimony (MP): Chooses the tree requiring the fewest evolutionary changes.
Maximum Likelihood (ML): Uses a statistical model to find the most likely tree given the data.
Bayesian methods: Use Bayesian statistics incorporating prior probabilities.
Clustering vs. Search Methods:
Clustering (e.g., NJ, UPGMA): Algorithm-based, producing a single best tree based on a fixed procedure.
Search (e.g., MP, ML, Bayesian): Uses optimality criterion to choose among trees, evaluating multiple trees to find the best one.
Number of Possible Trees:
Unrooted: , illustrating the combinatorial explosion as the number of taxa increases.
Rooted: , further emphasizing the computational challenges in phylogenetic analysis.
Maximum Parsimony (MP):
Finds a tree requiring the fewest evolutionary changes, a simple but sometimes misleading approach.
Long Branch Attraction:
Long branches cluster by chance due to homoplasy, an artifact that can distort phylogenetic relationships.
Improved sampling can mitigate this by breaking up long branches.
Maximum Likelihood (ML):
Requires a model of sequence evolution, a tree, and observed data to estimate the likelihood of the tree.
The tree that makes the data the most likely is the ML estimate, providing a statistically robust phylogenetic inference.
Choice of Outgroup:
Important for resolving ancestral parts of the tree, providing a reference point for determining the direction of evolutionary change.
An appropriate outgroup pulls ancestral ingroup taxa towards the lower part of the tree, helping to root the tree correctly.
Outgroups that are too closely or distantly related can cause issues, leading to inaccurate tree rooting.
Robustness of Clades:
Assess support for groupings using techniques like bootstrapping, which resamples the data to assess the confidence in each clade.
Bootstrap values ≥ 75-80% indicate strong support, suggesting the clade is well-supported by the data.
Sources of Uncertainty:
Few mutations, short divergence times, short sequences can all lead to poorly resolved phylogenies.
Conflict in data due to model misspecifications, homoplasy, or discordance between loci can create uncertainty.
Bootstrapping:
Testing for Consistency by resampling the data to assess the robustness of the tree topology.
Creating Consensus Trees:
Branches that are not supported in all trees are collapsed, summarizing the common signal across multiple trees.
Consistency Testing:
Testing the validity of traditional morphological characters with molecular phylogenies to ensure congruence between different data types.
To be consistent with a given phylogenetic tree, a character must map onto the tree with few changes in character state, indicating a reliable evolutionary signal.