Microbial Classification and Identification

Introduction

At the end of this lecture, you should be able to:
- Define key terms used in microbial taxonomy.
- Describe the major characteristics used in bacterial taxonomy.
- Understand the methodology for studying molecular characteristics.
- Understand how classification schemes can be used to develop identification schemes.

Taxonomy: The Practice & Science of Classification

Taxonomy has three components:
1. Classification: Grouping microbes into taxa, based on their characteristics.
2. Identification: Determining to which taxon an isolate belongs by determining the distinguishing characteristics of taxonomic groups.
3. Nomenclature: Assigning names to taxonomic groups according to the rules.
Taxonomy facilitates:
1. Organization of microbial diversity into an accessible, logical structure.
2. Arrangement into groups (taxa) with commonly understood names.
3. Identification of microorganisms (clinical, industrial, environmental).
4. Research predictions or hypotheses based on knowledge of similar organisms.

Classification of Living Things

All life is classified into 3 domains, then 23 divisions.
Viruses are considered “non-living” and do not appear on the classification diagram.
Carl Woese, Otto Kandler, and Mark Wheelis (1990) proposed the three-domain system: Archaea, Bacteria, and Eucarya.

Taxonomy: Classification - Methods/Systems

Carl Woese used sequence analysis (16S rRNA) to compare sequences and construct a phylogenetic tree, showing the relatedness of all living organisms and proposing archaea as a third domain.
This was a departure from the “kingdoms” previously used to classify life (eubacteria, archaea, animals, plants, fungi, protists, chromista), which were based on phenotype.
The use of molecular techniques and genetic analyses has revolutionized taxonomy, classification, and our understanding of the evolution of life.
Especially in microbiology, this involves the analysis of genes (DNA sequencing) and gene products (RNA, protein).

Taxonomy: An Organizational Framework

First component of taxonomy: Classification: grouping microbes into taxa, based on their characteristics.
Uses a combination of:
- Classical characteristics (includes phenetic/phenotypic).
- Molecular characteristics (biochemical, e.g., proteins; genetic, e.g., DNA, RNA sequences).
- Phylogenetic analyses: Analyzes relationships between isolates using all available information.

Classification Methods & Systems

A) Classical Characteristics

Phenetic (phenotypic) classification.
Uses:
1. Morphological characteristics.
2. Physiological characteristics.
3. Ecological characteristics.
Organisms are grouped by mutual similarities (i.e., they look the same).
The first system used.
Still the primary basis for plant and animal classification.
Requires consideration of multiple characteristics.
Does not necessarily reflect evolutionary relatedness.

Classification: Phenotype

Morphology:
- Easy to study (e.g., shape, size, color, staining, motility, spores).
- Somewhat useful for prokaryotes, useful for plants, animals, fungi.
Physiology, biochemistry, & metabolism:
- Type of metabolism, $O_2$ requirements, optimum growth temperature/pH, enzymes produced, ability to use diverse carbon compounds as sole sources of carbon and energy.
- Useful: directly related to the activity of many genes.
Ecological characteristics:
- Life-cycle, diseases caused, habitat occupied.
- Useful for some organisms, not for others.
(Assessment of genetic exchange):
- Important in many eukaryotes (ability to interbreed = species).
- Not useful for prokaryotes because they lack sexual reproduction.

Classification: Methods/Systems

C) Phylogenetic Analyses:

Organisms are grouped based on characteristics that reflect evolutionary relationships.
Known as the field of ‘cladistics’ (Greek for ‘branch’).
Organisms are arranged into an evolutionary or phylogenetic tree.
The tree can be based on:
- Fossil records.
- Anatomy and morphology.
- Physiology and biochemistry.
- Molecular characteristics (biochemical and genetic).

Classification: Molecular Characteristics

1. Biochemical Characteristics

1. Proteins

Analyzed by mass-spectrometry (MALDI-ToF).
Determines the profile of highly abundant proteins within a bacterial isolate.
Compare to a database to determine identity.

2. Fatty acids

Analyzed by fatty acid methyl ester (FAME) analysis.
Fatty acid profile of bacteria analyzed and compared to database.

Classification: Molecular Characteristics

2. Genetic (Sequence) Characteristics

Each gene is defined by the sequence of base pairs (A-T, G-C).
- Average gene size for prokaryotes: ~950 base pairs.
- Average gene size for eukaryotes: ~1350 base pairs.
The base sequence of specific genes differs between species and, to a lesser extent, within species; gene variants are termed alleles.
The larger the difference in the sequences between two organisms, the more likely they belong to different taxonomic groups.
Sequencing is used to:
- Identify bacteria.
- Create phylogenetic trees.

Classification: Molecular Characteristics

2. Genetic (Sequence) Characteristics

What should we sequence?
- One or more individual genes - which ones?
- Whole genome?

A) 16S ribosomal rRNA gene sequencing

Also known as small subunit (SSU) rRNA.
Known as a highly conserved gene.

B) Whole genome sequencing

Used to be crazy expensive.
Cheap as chips now.

Highly Conserved Genes (HCG)

Highly conserved genes are:
- Found in all organisms.
- Have the exact same function in all organisms.
- Have a critical (essential) role (e.g., ATP synthesis, 16S ribosomal rRNA gene).
Base sequence is very similar even across different organisms; no major mutations because:
- Genes/organisms cannot tolerate large mutations.
- These would be fatal for the cell.
Differences:
- Reflect evolutionary divergence.
- Can be used for classification and identification.
- Can be used for assessing relatedness.

Nucleic Acid Sequencing

(i) Classification and Identification

A consensus sequence is the nucleotides most commonly present at a specific position in the nucleic acid after analysis of the sequences of many organisms.
Deviations from the consensus are characteristic of particular groups and are called signature sequences.
Nucleic acid sequence analysis is being increasingly used in species identification.

Nucleic Acid Sequencing

(ii) Assessing Relatedness

Analysis of sequences enables the construction of phylogenetic trees.
Trees have nodes and branches.
- Node: A divergence event.
- Branch length: Represents the number of changes between two nodes.
Used to build a picture of relationships between organisms (phylogenetics).

Whole Genome Sequencing

Taxonomy is shifting to WGS for identification and classification.
16S rRNA is useful for determining ID to the genus level.
WGS may be required for the species level.
- Other techniques (e.g., MLST, SNP) are required for sub-species/strain analysis.
WG sequence of one organism compared to a closely related organism:
- Look for homology (measured by ANI - average nucleotide identity).
- Two genomes of the same species should have 95-96% ANI.
WGS data can also be used to quickly calculate G+C content.
ANI is replacing DNA-DNA hybridization.

Nucleic Acid Base Composition (%G+C)

Was the first, simplest nucleic acid technique.
$G+C \over G+C+A+T$ * 100
Historically determined indirectly from DNA melting temperature (Tm), the temperature at which 50% of the double-stranded DNA becomes single-stranded DNA.

G+C Content

G+C content varies widely from 25–80%, but within a species, it varies little and does not change.
Two unrelated bacteria can have similar G+C content but a different sequence; i.e., G+C is not a reflection of base sequence.
So how is % G+C data interpreted?
- If % G+C of two organisms differs by >10%: genomes are quite different; organisms are not closely related.
- BUT organisms with similar G+C may not be related (different base sequences).
- Similar G+C AND similar phenotype = likely to be related.
%G+C is easily calculated from WGS data.

DNA-DNA Hybridization

Measures the degree of re-association between two single DNA strands.
Tells you more about relatedness than G+C.
1. Heat to separate strands of both organisms.
2. Combine single-stranded DNA of both organisms.
3. Cool to renature DS DNA; non-complementary bases will remain unpaired.
4. Determine the degree of hybridization (determine the melting temperature of hybridized strands; a high degree requires higher temperature to separate strands).
DDH values can be calculated from WGS data.

Identification of Subspecies and Strains

Achieved by analyzing several genes.
Choose genes that evolve more quickly than 16S RNA genes.
Examples:
- Multilocus sequence typing (MLST):
  - Compares sequences of several (at least 5) conserved housekeeping genes.
  - Many variations (alleles) of genes exist.
  - Two isolates with the exact same alleles for multiple genes indicate a very close relationship (or the same strain!).
- Single nucleotide polymorphisms (SNP):
  - Targets specific, conserved regions.
  - Specific genes, intergenic regions, or non-coding regions.
  - Single base pair differences show evolutionary change.

Taxonomic Resolution of Molecular Techniques

From strain to Domain. (Genome sequencing has the highest resolution and 16S rDNA sequencing has lower resolution.)

Assessing Relatedness

New technologies can lead to better understanding of relationships between organisms and taxonomic changes (e.g., Pseudomonas).

Taxonomy: The Practice & Science of Classification

Taxonomy has three components:
1. Classification: Grouping microbes into taxa, based on their characteristics.
2. Identification: Determining to which taxon an isolate belongs by determining the distinguishing characteristics of taxonomic groups.
3. Nomenclature: Assigning names to taxonomic groups according to defined rules.

3. Identification

Important for:
1. Understanding and cataloging microbial diversity in nature.
2. Giving a name (may be an existing name or a new name).
3. Disease: identifying the organism causing an infectious disease.
4. Industry: where particular species are used in product manufacture (e.g., cheeses, beers and wines, pharmaceuticals including antibiotics).
Involves determining which taxon an isolate belongs using a polyphasic approach:
- A. Phenetic criteria: must know distinguishing characteristics of taxonomic groups; can develop keys comprising differentiating phenotypic characters AND/OR
- B. Molecular characteristics: compare sequence of an unknown to sequences of known organisms held in international databases, e.g., GenBank.

Phenotypic Tests

See earlier slides (e.g., morphology (e.g., Gram stain), physiology, biochemistry & metabolism).
Many commercial tests / kits are available that utilize key characteristics that distinguish between different genera or species (e.g., biochemical tests, e.g., API (Analytical Profile Index) strips).
Some are:
- Easy, quick, cheap.
- Do not require fancy machines.

Use of a Dichotomous Key

Dichotomous key example: separating genera of the Enterobacteriaceae family using phenotypic characteristics.

Nomenclature

Names assigned to taxonomic groups according to defined rules.
- International Code of Nomenclature of Bacteria (ICNB) - same for Botanical Nomenclature (includes fungi) (ICBN) and International Committee on Taxonomy of Viruses (ICTV).
Pioneered by Swedish botanist Carolus Linnaeus (1707-1778) - introduced a system of binomial nomenclature - each species is assigned a Latin scientific name.
- Genus name + species name
  - e.g., Homo sapiens (humans)
  - e.g., Escherichia coli (E. coli)
Traditional “polyphasic” approach uses all available data: chemotaxonomic, phenotypic, and genotypic data - difficult to use for organisms identified by genetic sequence alone.

Taxonomic Ranks

Microbes are grouped into categories (or ranks) containing similar organisms.
Groups at each level share common properties with the group they belong to in higher ranks.
Rank names and hierarchy are common to both phylogenetic and phenetic classification schemes.
Strain: a genetic variant or subtype of a bacterial species (e.g., some strains of Staphylococcus aureus have antibiotic resistance genes or virulence factors, whereas others do not).
Important ranks in microbial taxonomy: Domain, Kingdom, Phylum, Class, Order, Family, Genus, Species, Subspecies, Strain.

Taxonomy of Uncultured Microbes

Unique genetic sequences generated by metagenomics.
Terms for these sequences:
- ASV (amplicon sequence variant).
- OTU (operational taxonomic unit): sequences grouped on similarity (97%).
Phylotype: an organism identified solely by nucleic acid sequence (lacks sufficient data to confirm a species name but is definitely a real organism).
Candidatus: a “candidate species” (e.g., Candidatus Carsonella ruddii (no italics)); usually given where the candidate cannot be cultivated as a pure culture.

Resources

Prescott’s Microbiology (12th edition).
- Chapter 1: The evolution of microorganisms and microbiology.
- Chapter 17: Microbial DNA technologies.
- Chapter 26: Exploring microbes in ecosystems.
Pallen, MJ. Bacterial nomenclature in the era of genomics. doi: 10.1016/j.nmni.2021.100942
How to name a prokaryote?: Etymological considerations, proposals and practical advice in prokaryote nomenclature. https://academic.oup.com/femsre/article/23/2/231/524593
International Code of Nomenclature of Bacteria (Bacteriological Code) (1990) Revision (Lapage, S.P., Sneath, P.H.A., Lessel, E.F., Skerman, V.D.B.; Seeliger, H.P.R., Clarke, W.A., Eds.). American Society for Microbiology, Washington, DC. https://www.ncbi.nlm.nih.gov/books/NBK8817/