P​reactivity 7: Molecular Phylogenetics & Real-World Applications

Administrative Details

  • Preactivity 7 due: Thursday, 11:15 AM (submit worksheet PDF/JPEG + Brightspace questions)
  • Upcoming in-class session: Activity 7—additional real-world problem solving with phylogenies

Learning Objectives

  • LO-1: Construct a data matrix from DNA sequence data taken from several taxa
  • LO-2: Use a well-supported phylogenetic hypothesis to answer real-world biological questions by
    • Mapping known information (e.g., host, geography) onto the tree
    • Making evidence-based inferences about an unknown or novel taxon

Background & Conceptual Framework

  • Phylogenetic accuracy matters: Practical decisions (crop protection, public health, conservation) rely on choosing the most accurate tree among competing hypotheses
  • Previous units relied on morphological traits (phenotypic characters). This week shifts focus to molecular characters (DNA, RNA, proteins)
  • Molecular data advantages
    • Most abundant, covering full genomes of thousands of taxa
    • Rapid, economical sequencing technology
  • Genotype vs. phenotype refresher
    • DNA sequence = genotype
    • RNA & proteins = gene products and therefore part of phenotype

Using Molecular Sequences as Characters

  • Alignment prerequisite
    • Arrange homologous positions in columns so that every taxon has the same positional numbering
    • Number each position (e.g., 1–17 for a protein fragment)
  • Character vs. character state
    • Character: a position (site) in the aligned sequence
    • State: the nucleotide (DNA/RNA) or amino acid (protein) present at that position in a particular taxon
  • Informative vs. uninformative sites
    • Only variable positions help resolve relationships
    • Invariant positions do not contribute and can be excluded from the parsimony count

Protein Example (5 taxa, 17 positions)

  • Variable sites: 11 & 16
    • Position 11: 4× “M” (methionine), 1× “Q” (glutamine)
    • Position 16: 2× “N” (asparagine), 3× “H” (histidine)
  • All other 15 sites are invariant → uninformative

DNA Example (5 taxa, 15 bp)

  1. Identify variable sites: positions 4, 9, 13
  2. Highlight differences relative to an arbitrary reference (e.g., Taxon A) to ease matrix construction
  3. Build the data matrix
    • Rows = characters (3 variable positions)
    • Columns = taxa (A–E)
    • Binary coding (arbitrary but explicit choice of 0/1)
      • Position 4: A (0) vs. G (1)
      • Position 9: C (0) vs. T (1)
      • Position 13: G (0) vs. A (1)
  4. Fill in matrix with 0s & 1s for each taxon—this becomes input for parsimony analysis

Parsimony & Quantitative Metrics

  • Tree length (L): minimum number of evolutionary changes implied by the tree
  • Consistency Index (CI) quantifies homoplasy CI = \frac{m}{s}
    • m: minimum possible number of changes (sum over characters of [number of states – 1])
    • s: observed total changes on the evaluated tree (tree length)
    • Range 0–1; higher = less homoplasy → more parsimonious

Virus Case Study (BIO 201 Review)

  • Data set: 6 taxa (A–F) with a given matrix

  • Three hypotheses tested → parsimony mapping performed

    • Winning tree: Hypothesis 2 with CI = 0.6
  • Practical inference

    • Taxon A (novel virus) → closest relative = Taxon B
    • Known traits of B: hosted by fox in Mexico
    • Conclusions
    • Probable pre-human host: fox
    • Probable geographical origin: Mexico

    Ethical/Practical angle: informs surveillance, vaccination priorities, and public-health messaging

Real-World Problem for Preactivity 7

  • Scenario: Florida orange grower reports crop failure due to a fungus; similar fungi documented on four Pacific islands
    • Islands & corresponding taxa: Saipan (B), Okinawa (C), Java (D), Guam (E)
    • New Florida strain = Taxon A
  • Goal for students
    1. Create DNA data matrix from provided sequences (Taxa A–E)
    2. Compare two phylogenetic hypotheses (Tree 1 vs. Tree 2)
    • Map characters
    • Compute tree length and CI for each
    • Select most parsimonious / most accurate hypothesis
    1. Infer origin of Taxon A
    • Identify its closest relative on the chosen tree
    • Report island of origin to the Florida Dept. of Agriculture → informs which shipments to restrict

Step-by-Step Instructions (as per worksheet)

  • Download PDF worksheet from Brightspace
  • Either print or digitally annotate
  • Tasks:
    1. Fill in DNA matrix (variable positions & binary coding)
    2. Map characters on both hypothesized trees
    3. Count changes; compute CI for each tree
    4. Decide which hypothesis is best (lowest tree length, highest CI)
    5. Mark inferred origin of Taxon A
  • Save annotated worksheet as PDF or JPEG and upload
  • Complete accompanying Brightspace quiz (Preactivity 7 Questions)

Connections & Broader Significance

  • From morphology to molecules: reinforces that different data types can be integrated; molecular data now predominant due to availability
  • Genotype→phenotype link: while DNA is genotypic, the method of parsimony and tree inference is identical to that used for morphological traits
  • Application spectrum
    • Agriculture (crop disease tracking)
    • Epidemiology (virus spillover events)
    • Conservation genetics (source-population identification)
  • Ethical & practical implications
    • Correctly identifying origins prevents unnecessary trade restrictions or misdirected control efforts
    • Phylogenetic misinterpretation can lead to economic loss or ineffective policy

Key Takeaways & Study Tips

  • Memorize the workflow: Alignment → Variable sites → Data matrix → Tree mapping → Parsimony metrics → Inference
  • Practice quickly spotting variable positions and coding them into 0/1 (or other scheme)
  • Remember formulae and definitions (Tree length, CI)
  • Keep real-world stakes in mind; they help cement why analytical rigor is essential