Phylogenetics Summary

Key Concepts
  • Phylogeny:

    • Originates from Greek words: gr.phylongr.phylon (tribe, race, taxon) and geneagenea (origin, emergence).

    • First used by Häckel in 1866 to describe the evolutionary history of a group of organisms.

    • Represents branching relationships between species, illustrating the evolutionary connections and divergences.

    • Addresses the question: "Which species shares a most recent common ancestor with this species?" This helps in understanding the relationships and ancestry among different species.

  • Trees:

    • Represent evolutionary history as a branching diagram; Charles Darwin included a tree figure in "On the Origin of Species" (1859) to illustrate common descent, fundamentally changing how we view species relationships.

  • Tree Terminology:

    • Clade: A monophyletic group consisting of an ancestor and all its descendants.

    • Internal nodes: Represent hypothetical ancestors, indicating points of divergence.

    • Terminal nodes: Operational taxonomic units (OTUs), such as species or genes being studied.

    • Root: Represents the common ancestor of all taxa in the tree.

    • Branch (internal & external): Represents the evolutionary lineage connecting nodes; branch length may indicate the amount of evolutionary change.

  • Types of Phylogenetic Trees:

    • Phylogram: Branch lengths quantify the amount of change, indicating the extent of genetic or morphological divergence.

    • Dendrogram: Shows relative ancestry without quantifying the amount of change.

    • Trees can be seen as mobiles, depicting the same genealogy even when arranged differently, emphasizing that tree topology is key, not the specific arrangement.

  • Rooted vs. Unrooted Trees:

    • Rooted: Includes a root, indicating an evolutionary time axis, allowing for interpretation of the direction of evolutionary change.

    • Unrooted: Depicts similarity without a time axis, showing relationships but not evolutionary paths.

  • Polytomies:

    • Represent unresolved nodes indicating uncertainty or simultaneous divergence, often due to rapid speciation or insufficient data.

    • Phylogeneticists aim for dichotomal (bifurcating) relationships to clearly resolve evolutionary pathways.

  • Character Change:

    • Apomorphy: A derived trait unique to a terminal group.

    • Plesiomorphy: An ancestral trait.

    • Autapomorphy: A unique derived trait found in only one taxon.

    • Synapomorphy: A shared derived trait that indicates phylogenetic relationships.

    • Homoplasy:

      • Parallel evolution: Same feature arising from the same ancestral condition in independent lineages.

      • Convergent evolution: Same feature arising from different ancestral conditions.

      • Secondary loss: Reversion to ancestral condition.

  • Taxonomical Classification:

    • Monophyletic group: Includes ALL descendants of the most recent common ancestor, forming a complete clade.

    • Non-monophyletic group: An artificial grouping that does not accurately reflect evolutionary history.

    • Paraphyletic taxa: Contains the most recent common ancestor, but does not include all of the descendants of that ancestor, leading to an incomplete representation.

    • Polyphyletic taxa: Do not contain the most recent common ancestor of all the members, grouping taxa based on convergent traits rather than shared ancestry.

  • Inferring Molecular Phylogenies:

    • Convergence (homoplasy) in unrelated lineages can confound phylogenetic inference.

    • Problem with old branches: Accumulation of unique characters can obscure relationships.

  • Homoplasy:

    • Different taxa show identical characters due to homoplasy, not recent ancestry, leading to misleading relationships if not accounted for.

      • "Identical by state (IBS), not identical by descent (IBD)".

  • Quantifying Genetic Distance:

    • Compensating for homoplasy using models like Jukes Cantor (JC) or General Reversible (REV) to estimate true evolutionary distances.

  • Tree-Building Methods:

    • UPGMA: Unweighted Pair Group Method with Arithmetic Mean, a simple clustering method.

    • Neighbor-Joining (NJ): A fast distance-based method.

    • Maximum Parsimony (MP): Chooses the tree requiring the fewest evolutionary changes.

    • Maximum Likelihood (ML): Uses a statistical model to find the most likely tree given the data.

    • Bayesian methods: Use Bayesian statistics incorporating prior probabilities.

  • Clustering vs. Search Methods:

    • Clustering (e.g., NJ, UPGMA): Algorithm-based, producing a single best tree based on a fixed procedure.

    • Search (e.g., MP, ML, Bayesian): Uses optimality criterion to choose among trees, evaluating multiple trees to find the best one.

  • Number of Possible Trees:

    • Unrooted: Un=(2n5)!/[2n3(n3)!]Un = (2n-5)! / [ 2^{n-3} * (n-3)! ], illustrating the combinatorial explosion as the number of taxa increases.

    • Rooted: Rn=(2n3)!/[2n2(n2)!]Rn = (2n-3)! / [ 2^{n-2} * (n-2)! ], further emphasizing the computational challenges in phylogenetic analysis.

  • Maximum Parsimony (MP):

    • Finds a tree requiring the fewest evolutionary changes, a simple but sometimes misleading approach.

  • Long Branch Attraction:

    • Long branches cluster by chance due to homoplasy, an artifact that can distort phylogenetic relationships.

    • Improved sampling can mitigate this by breaking up long branches.

  • Maximum Likelihood (ML):

    • Requires a model of sequence evolution, a tree, and observed data to estimate the likelihood of the tree.

    • The tree that makes the data the most likely is the ML estimate, providing a statistically robust phylogenetic inference.

  • Choice of Outgroup:

    • Important for resolving ancestral parts of the tree, providing a reference point for determining the direction of evolutionary change.

    • An appropriate outgroup pulls ancestral ingroup taxa towards the lower part of the tree, helping to root the tree correctly.

    • Outgroups that are too closely or distantly related can cause issues, leading to inaccurate tree rooting.

  • Robustness of Clades:

    • Assess support for groupings using techniques like bootstrapping, which resamples the data to assess the confidence in each clade.

    • Bootstrap values ≥ 75-80% indicate strong support, suggesting the clade is well-supported by the data.

  • Sources of Uncertainty:

    • Few mutations, short divergence times, short sequences can all lead to poorly resolved phylogenies.

    • Conflict in data due to model misspecifications, homoplasy, or discordance between loci can create uncertainty.

  • Bootstrapping:

    • Testing for Consistency by resampling the data to assess the robustness of the tree topology.

  • Creating Consensus Trees:

    • Branches that are not supported in all trees are collapsed, summarizing the common signal across multiple trees.

  • Consistency Testing:

    • Testing the validity of traditional morphological characters with molecular phylogenies to ensure congruence between different data types.

    • To be consistent with a given phylogenetic tree, a character must map onto the tree with few changes in character state, indicating a reliable evolutionary signal.