Recorded Version - 8/27/2025 - Notes on Phylogeny and Genetic Methods
Phylogeny, Molecular Methods, and Key Genetic Tools
Overview: phylogeny = a classification system organized along known lineages that reflects patterns of relatedness and evolutionary history. It helps answer questions like who is related to whom and how lineages diverged over time.
Distinction from traditional Linnaean taxonomy:
- Phylogeny aims to reflect evolutionary relationships (best guess given the data and methods).
- Linnaean taxonomy (historical) classified organisms based on appearance and function (form and performance) without explicit intent to map evolutionary history.
- Linnaeus’s work predates Darwin; it categorized by like-with-like, largely on morphology.
Relationship between taxonomy and evolution:
- Many taxonomic groupings align with evolutionary relationships, but not always when constructed without an explicit evolutionary framework.
- Taxonomy can provide insights into evolution, but it is not itself a guarantee of evolutionary relationships.
Early approaches to building phylogenies (morphology-based):
- Before molecular genetics, phylogenies were reconstructed from skeletal anatomy and fossil data (e.g., archosaurs).
- Example figure (from Sterling Esbed’s work) shows archosaurs distinguished by skeletal reconstructions; anatomy informs phylogeny but is trickier and less precise than genetic data.
- Limitations: convergence, phenotypic plasticity, and misinterpretation of form can mislead classifications (e.g., grouping all flying animals together simply because they fly).
- Takeaway: reliance on form and function alone can mislead about evolutionary relationships.
Modern genetic approaches to phylogeny (overview):
- With molecular data, we can assemble phylogenies based on DNA sequences, which often provide clearer insight into relatedness than morphology alone.
- The amount and type of data used depend on the question (broad deep-time relationships vs. close intra-species relationships).
- Practical constraint: sequencing entire genomes for all questions is powerful but not always necessary or practical.
Genome scales and information content (context for choosing data):
- Human genome (haploid) length:
- Within-human genetic difference: < 1\% across the genome, highlighting the richness of information in very small differences.
- For comparison:
- E. coli genome:
- Drosophila genome:
- Mouse genome:
- Information density analogy: a fully sequenced human genome contains roughly the information content of about 300 phone books if you think in terms of base-pairs vs characters. A 500-page phone book contains ~ characters, whereas a human genome has ≈ base pairs.
- Practical implication: whole-genome sequencing is powerful but expensive; for many evolutionary questions, targeted sequencing of informative regions is more efficient.
Targeted genomic regions vs whole genomes: a practical approach
- Organellar genomes (mitochondria and chloroplasts) are much smaller and more tractable: roughly (i.e., about ).
- Targeted regions can yield useful phylogenetic information without the cost of full genome sequencing.
- For population or lineage questions, specific, informative regions can suffice to infer relatedness and history.
DNA markers and their purposes:
- Microsatellites and single nucleotide polymorphisms (SNPs): targeted DNA regions that vary among individuals and populations.
- SNPs on specific chromosomes or gene regions are used to distinguish lineages and recent ancestry (e.g., population structure, migration).
- Mitochondrial DNA (mtDNA) and Y-chromosome DNA are especially useful for tracing maternal and paternal lineages, respectively.
Mitochondrial DNA and human ancestry (mitochondrial Eve concept):
- mtDNA is inherited maternally (from mother to offspring); the paternal mtDNA does not contribute to the offspring’s mtDNA line.
- Mitochondrial Eve refers to the most recent common matrilineal ancestor from whom all living humans’ mtDNA is descended.
- Testing options historically included consumer services (Genographic Project, 23andMe) offering mtDNA and Y-DNA ancestry analyses (pricing example cited: around $100).
- Practical implication: mtDNA and Y-DNA analyses can reveal population histories and migrations within humans, though they represent only portions of the genome and an incomplete picture of ancestry.
The Y chromosome and paternal lineages:
- Y-DNA traces the male line (patrilineal heritage); since daughters do not inherit a Y chromosome, only males carry and pass this marker along.
- Provides complementary information to mtDNA about population history and paternal ancestry.
Broadly conserved genetic markers for deep phylogeny: the 16S rRNA gene
- Need for markers that are shared across very distant lineages (across domains of life) requires sequences that change very slowly.
- The 16S rRNA gene is part of the ribosomal RNA genes (16S rRNA for the small ribosomal subunit, 16S rDNA).
- Why 16S is useful:
- It evolves slowly, making it suitable for reconstructing phylogeny across broad evolutionary distances (from bacteria to archaea to eukaryotes in the context of the tree of life).
- It is part of the rRNA machinery essential to the cell, so changes are constrained; thus, it is highly informative for deep evolutionary relationships.
Carl Woese and the three-domain system: a turning point in the tree of life
- Woese (mid-1970s) used 16S sequences to classify life into three domains: Bacteria, Archaea, and Eukarya.
- This work revealed that Bacteria and Archaea are not as closely related as previously thought; archaea are as distinct from bacteria as either is from eukaryotes in terms of 16S similarity.
- Result: a major shift away from the prior two-kingdom view to the three-domain view of life.
- Initially, Woese’s findings were controversial and not immediately accepted, illustrating how paradigm-shifting discoveries can face skepticism.
The impact of 16S sequencing on microbiology
- 16S sequencing enables the identification and cataloging of microbes in environmental samples (soil, sediment, water) without the need to culture organisms in the lab.
- A striking outcome: the vast majority of microbes observed in environmental samples are not easily culturable with standard laboratory techniques; many are effectively invisible to traditional culture-based approaches.
- This realization expanded our view of microbial diversity and distribution and shifted how we study microbial communities.
Practical and philosophical implications
- Ethically and practically, ancestry testing (mtDNA and Y-DNA) raises questions about privacy, interpretation of heritage, and how results are communicated and used.
- Philosophically, the shift from “organisms are classified by how they look” to “organisms are related by deep genetic history” reframes our understanding of life’s diversity and its origins.
- Scientifically, using conserved markers (like 16S) for broad phylogenies while using variable markers (SNPs, microsatellites) for recent, fine-scale relationships provides a powerful, tiered approach to studying evolution and population history.
Notes on terminology and context from the lecture
- The term "archaebacteria" is older terminology; modern usage typically refers to domains Bacteria, Archaea, and Eukarya.
- The term noncoding DNA (often historically mislabeled as "junk" or "nonsense" DNA) describes regions that do not code for proteins but can still carry important regulatory or structural information; these regions tend to be variable and not as suitable for deep-time phylogenies as conserved markers like 16S.
- A key principle: mutations in highly conserved genes (like 16S rRNA) accumulate slowly, enabling cross-domain comparisons, whereas mutations in more variable regions (noncoding regions, SNPs in specific loci) accumulate more rapidly and are useful for within-species and recent evolutionary questions.
Summary takeaway
- Phylogenies can be built from morphology or from genetic data; genetics offers a more precise window into evolutionary relationships, especially over deep time and across broad domains.
- Molecular markers provide a toolkit for different scales of inquiry: whole-genome data for fine-scale, or organellar genes and conserved markers like 16S for broad, cross-domain phylogeny.
- Understanding the origin and diversity of life requires both historical (conserved markers like 16S) and contemporary (SNPs, mtDNA, Y-DNA) genetic information, along with an awareness of the limitations and assumptions each method carries.
Quick reference to key numeric ideas from the talk
- Human haploid genome:
- Human-to-human difference: < 1\% across the genome
- E. coli genome:
- Drosophila genome:
- Mouse genome:
- Organellar genomes: (i.e., 16–200 kb)
- Mitochondrial Eve and ancestry testing have been accessible for around a century-scale of consumer testing, historically around the order of for mtDNA and Y-DNA analyses
Note: The transcript ends mid-discussion about non-culturable microbiomes, illustrating that much of microbial diversity remains inaccessible by culture-based methods, which reinforced the shift toward sequence-based, culture-independent microbial ecology.