Phylogenetic Trees: Construction, Data Types, Validation, and Concepts
Phylogenetic trees: construction, data, and interpretation
What is a phylogenetic tree? A diagram that represents evolutionary relationships among taxa derived from a common ancestor. Today’s focus: how to construct one and what data to use.
Derived characteristics and groups
- Three groups of taxa concepts (monophyletic, paraphyletic, polyphyletic):
- Monophyletic: includes the common ancestor and all of its descendants. These groups are ideal for reconstructing evolutionary histories because they reflect correct relationships.
- Paraphyletic: includes the common ancestor and some, but not all, descendants.
- Polyphyletic: does not include the most recent common ancestor of all members; descendants arise from multiple ancestral sources.
- Monophyly is preferred for building phylogenies because it captures true descent from a single ancestor.
Data used to build a phylogenetic tree
- Character presence/absence data (morphological traits) and character states can be gained or lost over evolution.
- Important to consider both gains and losses when inferring relationships.
- Example characteristics used in class:
- Presence of more than three pairs of legs
- Presence or absence of wings
- Development type: pupil development vs direct development
- Presence of sucking mouthparts
- Mandibles for grinding food
Outgroup and rooting the tree
- Outgroup: a taxon that is closely related to the in-group but diverged early from the rest of the group; used to root the tree and infer ancestral vs derived states.
- In the example, the outgroup is shown on the left; the root node is shared with the outgroup diverging early.
- Purpose of the outgroup: determine whether a given character state is ancestral (present in the root) or derived (arose later in the in-group).
- Practical use: if the outgroup lacks a trait (e.g., pupil development), that trait is unlikely to be ancestral in the in-group, helping to test hypotheses about the evolution of that trait.
- Root node: the most recent common ancestor of all in-group taxa; outgroup helps stabilize its placement.
- In the example, jawless fish are used as the outgroup and help determine whether pupil development is ancestral or derived.
Example: testing pupil development as a character
- Observed table shows: C and D have a pupil stage; A and B do not.
- Three competing hypotheses for C and D:
1) C and D are more closely related to each other (they share pupil development).
2) C and D evolved pupil development independently (convergent evolution).
3) Pupil development was present in the common ancestor of all four (A, B, C, D) but was lost in A and B. - Role of the outgroup in choosing among hypotheses: if the outgroup lacks pupil development, the hypothesis that pupil development was present in the common ancestor becomes less likely; the simplest explanation (parsimony) may favor C and D being closely related.
- Parsimony principle: the simplest explanation with the fewest evolutionary changes is preferred. Also called Occam’s Razor in this context.
- Formal definition (parsimony): select the tree that minimizes the total number of character-state changes across all branches:
where $c_e$ is the number of state changes on edge $e$ of tree $T$.- Illustration using the examples: first tree may require four changes from the root; alternative trees might require five or six changes (including losses), making them less parsimonious.
- Bottom line: choose the most parsimonious tree as the best-supported hypothesis given the data; often you check multiple characters to ensure consistency.
- Homework/practice: students are encouraged to review how to derive the table from baseline data and how to count changes to compare trees. The instructor notes that questions on the exam will likely be multiple choice or require choosing the best-fitting tree from options, not drawing from scratch.
Other approaches to building phylogenies
- Morphological/visible traits: presence or absence of wings, legs, mouthparts, developmental patterns, etc.
- Molecular data: DNA/RNA sequence variation as additional character data.
- Combining data sources: integrate visible characters with molecular data to validate and refine trees.
- Alignment and character-state inference from molecular data:
- Align sequences for species (e.g., rat, mouse, guinea pig) with an outgroup (rabbit).
- Identify diagnostic substitutions (e.g., an A to G change at a region common to all in-group species descended from the ancestor).
- Blue changes: same substitutions observed in multiple lineages (e.g., C to T and C to A in blue region) indicating shared ancestry between some in-group members.
- Orange changes: random variations that are not informative for relationships (not aligned across all members).
- Distance-based molecular approaches:
- Generate a distance matrix by counting base differences between species.
- Example: human, chimp, gorilla, gibbon.
- Fewer differences imply closer relatedness (e.g., human–chimp pair has only two differing bases in the example).
- Use the distance matrix to construct a tree by iteratively joining the pair with the smallest distance and recomputing distances.
- The principle: the lower the distance between two taxa, the higher the probability of a recent common ancestor.
- Molecular clock assumption (rate of evolution): the rate is assumed constant across lineages; if there are six differences between two taxa, you might assign three changes on each branch under a simple clock model, i.e., equal rates in both lineages.
- Caveats: the molecular clock is not always accurate; multiple factors can violate rate constancy, so it’s important to corroborate molecular trees with other data (morphology, fossils).
- Validation through cross-checking: verify that molecular-based trees align with character-based trees, and that both are consistent with fossil evidence.
Validation and calibration with fossils
- Fossil data can calibrate the timing of evolutionary events in a phylogeny.
- Limitations of fossils:
- Fossils mostly preserve hard parts (bones, shells); soft-bodied organisms are rarely fossilized.
- Incompleteness of the fossil record due to preservation bias.
- Stratigraphy and time scale:
- Fossils are typically found in lower (older) layers and higher (more recent) layers reflect newer fossils.
- Geologists use fossil observations to calibrate the timing of lineages on a phylogeny.
- Radiometric dating to date fossils:
- Carbon-14 dating is a common method.
- Process outline: cosmic radiation in the atmosphere creates carbon-14; plants take up CO2 containing C-14; animals feeding on plants incorporate it; after death, C-14 decays to nitrogen-14.
- Half-life of carbon-14: .
- Decay relationships:
- N(t) = N0 iggl(rac{1}{2}iggr)^{rac{t}{T{1/2}}} = N0 e^{-kt}, \, k = rac{\ln 2}{T{1/2}}.
- For each percent of C-14 remaining, you can estimate elapsed time since death, yielding the age of the fossil.
- Fossil data support phylogenies by providing absolute dates for nodes (common ancestors) and by context (environmental changes).
- Integrated approach: combine character-state data (morphology), molecular data, and fossil dating to build and validate phylogenies.
- Example applications:
- Dinosaurs and birds: archaeopteryx as a sister group to living birds; roots and nodes inferred with both morphological traits (e.g., digits, feathers) and fossil succession.
- Using fossil calibration to anchor the timing of the origin of major clades and to interpret environmental influences on evolution.
Mass extinctions and evolutionary trajectories
- Mass extinctions can drastically alter evolutionary paths by removing dominant groups and reducing competition.
- Examples discussed:
- Cretaceous extinction: asteroid impact, leading to dinosaur extinction and opening ecological space for mammals.
- Permian extinction (~252 million years ago): one of the most severe, with >90% of genera extinct due to a megavolcanic event in Siberia that caused atmospheric/gas composition changes and ocean acidification.
- After mass extinctions, surviving lineages can diversify rapidly due to reduced competition.
- Implication: mass extinctions shape the structure of subsequent phylogenies and the timing of lineage diversification.
Practical exam expectations and tips
- You may be asked to identify paraphyletic groups from a diagram (e.g., group containing a common ancestor but not all descendants).
- You may be asked to identify the sister group of a lineage (e.g., mammals’ sister group in a given figure).
- You may be asked to determine whether two lineages share the same most recent common ancestor by tracing nodes back to a shared node.
- You may be asked to compare alternative trees and select the most parsimonious one using the lowest number of evolutionary changes.
- Emphasis on recognizing multiple lines of evidence (morphology, molecular data, fossil record) and how to corroborate a tree across datasets.
Real-world relevance and applications
- Phylogenetic trees help track the spread and origin of pathogens (e.g., HIV/AIDS outbreak tracing using sequence data across patients).
- They enable reconstruction of the history of life on Earth and interpretation of how environmental changes shaped evolution.
- They underpin taxonomy, systematics, and comparative biology by grouping organisms with true evolutionary relationships.
Quick recap of the workflow you should understand
- Start with character-state data (presence/absence of traits) and/or molecular data (DNA/RNA sequences).
- Use an outgroup to root the tree and infer ancestral vs derived character states.
- Consider multiple characters to avoid bias from a single trait.
- Apply parsimony to pick the simplest explanation (fewest changes) for trait evolution across the tree.
- Validate and calibrate the tree with fossil data and, when possible, radiometric dating to place time estimates on nodes.
- Use a combination of data sources to obtain a robust phylogeny and understand evolutionary history.
Final notes on terminology recap
- Outgroup: taxa used to root the tree and infer ancestral states.
- Root node: the most recent common ancestor of all in-group taxa.
- Parsimony: preference for the tree with the fewest evolutionary changes; also called Occam’s Razor in phylogenetics.
- Monophyletic: includes the ancestor and all descendants.
- Paraphyletic: includes the ancestor and some but not all descendants.
- Polyphyletic: excludes the most recent common ancestor of all members by grouping taxa from different ancestors.
Illustrative examples referenced in class
- Pupil development as a character: comparing C/D vs A/B to discuss hypotheses and outgroup influence.
- Hypothetical four-taxon example (A, B, C, D) used to illustrate parsimony and rooting.
- Dinosaur–bird fossil example: outgroup and sister-group concepts illustrated with Archaeopteryx and modern birds.
- Pathogen tracing example: sequencing data from patients to determine if infections originated from a common source (single source vs multiple sources).
Important caveats
- Fossil data are incomplete and biased toward organisms with hard parts; absence of a fossil does not prove absence of a lineage.
- Molecular clocks may not always run at constant rates; multiple lines of evidence are essential for robust conclusions.
- Different data types (morphology vs molecular) may yield congruent or conflicting trees; reconciliation is a key step in phylogenetics.
Core takeaway
- Reading and constructing phylogenetic trees involves integrating visible traits, molecular data, and fossil evidence, guided by the principle of parsimony and reinforced by fossil calibration to understand the timing and pattern of evolution.