Preactivity 7: Molecular Phylogenetics & Real-World Applications
Administrative Details
- Preactivity 7 due: Thursday, 11:15 AM (submit worksheet PDF/JPEG + Brightspace questions)
- Upcoming in-class session: Activity 7—additional real-world problem solving with phylogenies
Learning Objectives
- LO-1: Construct a data matrix from DNA sequence data taken from several taxa
- LO-2: Use a well-supported phylogenetic hypothesis to answer real-world biological questions by
- Mapping known information (e.g., host, geography) onto the tree
- Making evidence-based inferences about an unknown or novel taxon
Background & Conceptual Framework
- Phylogenetic accuracy matters: Practical decisions (crop protection, public health, conservation) rely on choosing the most accurate tree among competing hypotheses
- Previous units relied on morphological traits (phenotypic characters). This week shifts focus to molecular characters (DNA, RNA, proteins)
- Molecular data advantages
- Most abundant, covering full genomes of thousands of taxa
- Rapid, economical sequencing technology
- Genotype vs. phenotype refresher
- DNA sequence = genotype
- RNA & proteins = gene products and therefore part of phenotype
Using Molecular Sequences as Characters
- Alignment prerequisite
- Arrange homologous positions in columns so that every taxon has the same positional numbering
- Number each position (e.g., 1–17 for a protein fragment)
- Character vs. character state
- Character: a position (site) in the aligned sequence
- State: the nucleotide (DNA/RNA) or amino acid (protein) present at that position in a particular taxon
- Informative vs. uninformative sites
- Only variable positions help resolve relationships
- Invariant positions do not contribute and can be excluded from the parsimony count
Protein Example (5 taxa, 17 positions)
- Variable sites: 11 & 16
- Position 11: 4× “M” (methionine), 1× “Q” (glutamine)
- Position 16: 2× “N” (asparagine), 3× “H” (histidine)
- All other 15 sites are invariant → uninformative
DNA Example (5 taxa, 15 bp)
- Identify variable sites: positions 4, 9, 13
- Highlight differences relative to an arbitrary reference (e.g., Taxon A) to ease matrix construction
- Build the data matrix
- Rows = characters (3 variable positions)
- Columns = taxa (A–E)
- Binary coding (arbitrary but explicit choice of 0/1)
- Position 4: A (0) vs. G (1)
- Position 9: C (0) vs. T (1)
- Position 13: G (0) vs. A (1)
- Fill in matrix with 0s & 1s for each taxon—this becomes input for parsimony analysis
Parsimony & Quantitative Metrics
- Tree length (L): minimum number of evolutionary changes implied by the tree
- Consistency Index (CI) quantifies homoplasy
CI = \frac{m}{s}
- m: minimum possible number of changes (sum over characters of [number of states – 1])
- s: observed total changes on the evaluated tree (tree length)
- Range 0–1; higher = less homoplasy → more parsimonious
Virus Case Study (BIO 201 Review)
Data set: 6 taxa (A–F) with a given matrix
Three hypotheses tested → parsimony mapping performed
- Winning tree: Hypothesis 2 with CI = 0.6
Practical inference
- Taxon A (novel virus) → closest relative = Taxon B
- Known traits of B: hosted by fox in Mexico
- Conclusions
- Probable pre-human host: fox
- Probable geographical origin: Mexico
Ethical/Practical angle: informs surveillance, vaccination priorities, and public-health messaging
Real-World Problem for Preactivity 7
- Scenario: Florida orange grower reports crop failure due to a fungus; similar fungi documented on four Pacific islands
- Islands & corresponding taxa: Saipan (B), Okinawa (C), Java (D), Guam (E)
- New Florida strain = Taxon A
- Goal for students
- Create DNA data matrix from provided sequences (Taxa A–E)
- Compare two phylogenetic hypotheses (Tree 1 vs. Tree 2)
- Map characters
- Compute tree length and CI for each
- Select most parsimonious / most accurate hypothesis
- Infer origin of Taxon A
- Identify its closest relative on the chosen tree
- Report island of origin to the Florida Dept. of Agriculture → informs which shipments to restrict
Step-by-Step Instructions (as per worksheet)
- Download PDF worksheet from Brightspace
- Either print or digitally annotate
- Tasks:
- Fill in DNA matrix (variable positions & binary coding)
- Map characters on both hypothesized trees
- Count changes; compute CI for each tree
- Decide which hypothesis is best (lowest tree length, highest CI)
- Mark inferred origin of Taxon A
- Save annotated worksheet as PDF or JPEG and upload
- Complete accompanying Brightspace quiz (Preactivity 7 Questions)
Connections & Broader Significance
- From morphology to molecules: reinforces that different data types can be integrated; molecular data now predominant due to availability
- Genotype→phenotype link: while DNA is genotypic, the method of parsimony and tree inference is identical to that used for morphological traits
- Application spectrum
- Agriculture (crop disease tracking)
- Epidemiology (virus spillover events)
- Conservation genetics (source-population identification)
- Ethical & practical implications
- Correctly identifying origins prevents unnecessary trade restrictions or misdirected control efforts
- Phylogenetic misinterpretation can lead to economic loss or ineffective policy
Key Takeaways & Study Tips
- Memorize the workflow: Alignment → Variable sites → Data matrix → Tree mapping → Parsimony metrics → Inference
- Practice quickly spotting variable positions and coding them into 0/1 (or other scheme)
- Remember formulae and definitions (Tree length, CI)
- Keep real-world stakes in mind; they help cement why analytical rigor is essential