Recorded Version - 8/27/2025 - Notes on Phylogeny and Genetic Methods

Phylogeny, Molecular Methods, and Key Genetic Tools

  • Overview: phylogeny = a classification system organized along known lineages that reflects patterns of relatedness and evolutionary history. It helps answer questions like who is related to whom and how lineages diverged over time.

  • Distinction from traditional Linnaean taxonomy:

    • Phylogeny aims to reflect evolutionary relationships (best guess given the data and methods).
    • Linnaean taxonomy (historical) classified organisms based on appearance and function (form and performance) without explicit intent to map evolutionary history.
    • Linnaeus’s work predates Darwin; it categorized by like-with-like, largely on morphology.
  • Relationship between taxonomy and evolution:

    • Many taxonomic groupings align with evolutionary relationships, but not always when constructed without an explicit evolutionary framework.
    • Taxonomy can provide insights into evolution, but it is not itself a guarantee of evolutionary relationships.
  • Early approaches to building phylogenies (morphology-based):

    • Before molecular genetics, phylogenies were reconstructed from skeletal anatomy and fossil data (e.g., archosaurs).
    • Example figure (from Sterling Esbed’s work) shows archosaurs distinguished by skeletal reconstructions; anatomy informs phylogeny but is trickier and less precise than genetic data.
    • Limitations: convergence, phenotypic plasticity, and misinterpretation of form can mislead classifications (e.g., grouping all flying animals together simply because they fly).
    • Takeaway: reliance on form and function alone can mislead about evolutionary relationships.
  • Modern genetic approaches to phylogeny (overview):

    • With molecular data, we can assemble phylogenies based on DNA sequences, which often provide clearer insight into relatedness than morphology alone.
    • The amount and type of data used depend on the question (broad deep-time relationships vs. close intra-species relationships).
    • Practical constraint: sequencing entire genomes for all questions is powerful but not always necessary or practical.
  • Genome scales and information content (context for choosing data):

    • Human genome (haploid) length: 3.1×109 base pairs3.1\times 10^{9}\ \text{base pairs}
    • Within-human genetic difference: < 1\% across the genome, highlighting the richness of information in very small differences.
    • For comparison:
    • E. coli genome: 4.5×106 bp4.5\times 10^{6}\ \text{bp}
    • Drosophila genome: 1.69×108 bp1.69\times 10^{8}\ \text{bp}
    • Mouse genome: 2.5×109 bp2.5\times 10^{9}\ \text{bp}
    • Information density analogy: a fully sequenced human genome contains roughly the information content of about 300 phone books if you think in terms of base-pairs vs characters. A 500-page phone book contains ~10710^{7} characters, whereas a human genome has ≈3×1093\times 10^{9} base pairs.
    • Practical implication: whole-genome sequencing is powerful but expensive; for many evolutionary questions, targeted sequencing of informative regions is more efficient.
  • Targeted genomic regions vs whole genomes: a practical approach

    • Organellar genomes (mitochondria and chloroplasts) are much smaller and more tractable: roughly 16  to  200,000 bp16\;\text{to}\;200{,}000\ \text{bp} (i.e., about 16200  kb16\text{–}200\;\text{kb}).
    • Targeted regions can yield useful phylogenetic information without the cost of full genome sequencing.
    • For population or lineage questions, specific, informative regions can suffice to infer relatedness and history.
  • DNA markers and their purposes:

    • Microsatellites and single nucleotide polymorphisms (SNPs): targeted DNA regions that vary among individuals and populations.
    • SNPs on specific chromosomes or gene regions are used to distinguish lineages and recent ancestry (e.g., population structure, migration).
    • Mitochondrial DNA (mtDNA) and Y-chromosome DNA are especially useful for tracing maternal and paternal lineages, respectively.
  • Mitochondrial DNA and human ancestry (mitochondrial Eve concept):

    • mtDNA is inherited maternally (from mother to offspring); the paternal mtDNA does not contribute to the offspring’s mtDNA line.
    • Mitochondrial Eve refers to the most recent common matrilineal ancestor from whom all living humans’ mtDNA is descended.
    • Testing options historically included consumer services (Genographic Project, 23andMe) offering mtDNA and Y-DNA ancestry analyses (pricing example cited: around $100).
    • Practical implication: mtDNA and Y-DNA analyses can reveal population histories and migrations within humans, though they represent only portions of the genome and an incomplete picture of ancestry.
  • The Y chromosome and paternal lineages:

    • Y-DNA traces the male line (patrilineal heritage); since daughters do not inherit a Y chromosome, only males carry and pass this marker along.
    • Provides complementary information to mtDNA about population history and paternal ancestry.
  • Broadly conserved genetic markers for deep phylogeny: the 16S rRNA gene

    • Need for markers that are shared across very distant lineages (across domains of life) requires sequences that change very slowly.
    • The 16S rRNA gene is part of the ribosomal RNA genes (16S rRNA for the small ribosomal subunit, 16S rDNA).
    • Why 16S is useful:
    • It evolves slowly, making it suitable for reconstructing phylogeny across broad evolutionary distances (from bacteria to archaea to eukaryotes in the context of the tree of life).
    • It is part of the rRNA machinery essential to the cell, so changes are constrained; thus, it is highly informative for deep evolutionary relationships.
  • Carl Woese and the three-domain system: a turning point in the tree of life

    • Woese (mid-1970s) used 16S sequences to classify life into three domains: Bacteria, Archaea, and Eukarya.
    • This work revealed that Bacteria and Archaea are not as closely related as previously thought; archaea are as distinct from bacteria as either is from eukaryotes in terms of 16S similarity.
    • Result: a major shift away from the prior two-kingdom view to the three-domain view of life.
    • Initially, Woese’s findings were controversial and not immediately accepted, illustrating how paradigm-shifting discoveries can face skepticism.
  • The impact of 16S sequencing on microbiology

    • 16S sequencing enables the identification and cataloging of microbes in environmental samples (soil, sediment, water) without the need to culture organisms in the lab.
    • A striking outcome: the vast majority of microbes observed in environmental samples are not easily culturable with standard laboratory techniques; many are effectively invisible to traditional culture-based approaches.
    • This realization expanded our view of microbial diversity and distribution and shifted how we study microbial communities.
  • Practical and philosophical implications

    • Ethically and practically, ancestry testing (mtDNA and Y-DNA) raises questions about privacy, interpretation of heritage, and how results are communicated and used.
    • Philosophically, the shift from “organisms are classified by how they look” to “organisms are related by deep genetic history” reframes our understanding of life’s diversity and its origins.
    • Scientifically, using conserved markers (like 16S) for broad phylogenies while using variable markers (SNPs, microsatellites) for recent, fine-scale relationships provides a powerful, tiered approach to studying evolution and population history.
  • Notes on terminology and context from the lecture

    • The term "archaebacteria" is older terminology; modern usage typically refers to domains Bacteria, Archaea, and Eukarya.
    • The term noncoding DNA (often historically mislabeled as "junk" or "nonsense" DNA) describes regions that do not code for proteins but can still carry important regulatory or structural information; these regions tend to be variable and not as suitable for deep-time phylogenies as conserved markers like 16S.
    • A key principle: mutations in highly conserved genes (like 16S rRNA) accumulate slowly, enabling cross-domain comparisons, whereas mutations in more variable regions (noncoding regions, SNPs in specific loci) accumulate more rapidly and are useful for within-species and recent evolutionary questions.
  • Summary takeaway

    • Phylogenies can be built from morphology or from genetic data; genetics offers a more precise window into evolutionary relationships, especially over deep time and across broad domains.
    • Molecular markers provide a toolkit for different scales of inquiry: whole-genome data for fine-scale, or organellar genes and conserved markers like 16S for broad, cross-domain phylogeny.
    • Understanding the origin and diversity of life requires both historical (conserved markers like 16S) and contemporary (SNPs, mtDNA, Y-DNA) genetic information, along with an awareness of the limitations and assumptions each method carries.
  • Quick reference to key numeric ideas from the talk

    • Human haploid genome: 3.1×109 bp3.1\times 10^{9}\text{ bp}
    • Human-to-human difference: < 1\% across the genome
    • E. coli genome: 4.5×106 bp4.5\times 10^{6}\text{ bp}
    • Drosophila genome: 1.69×108 bp1.69\times 10^{8}\text{ bp}
    • Mouse genome: 2.5×109 bp2.5\times 10^{9}\text{ bp}
    • Organellar genomes: 16 to 200,000 bp16\text{ to }200{,}000\text{ bp} (i.e., 16–200 kb)
    • Mitochondrial Eve and ancestry testing have been accessible for around a century-scale of consumer testing, historically around the order of $100 per test\$100\text{ per test} for mtDNA and Y-DNA analyses
  • Note: The transcript ends mid-discussion about non-culturable microbiomes, illustrating that much of microbial diversity remains inaccessible by culture-based methods, which reinforced the shift toward sequence-based, culture-independent microbial ecology.