16S Ribosomal RNA Amplicon Sequencing

Overview of the 16S16S Ribosomal RNA (rRNArRNA) Gene

  • Carl Woese and his colleagues were the first to describe bacterial ribosomal genes as "molecular clocks."

  • These genes are considered molecular clocks because of several uncommon features:     - Universality across the bacterial domain.     - Functional activity and essential cellular functions.     - Extremely conserved structure and nucleotide sequence.

  • There are three types of ribosomal RNA in prokaryotic ribosomes, classified by their sedimentation rates:     - 23S23S: Sequenced length of approximately 33003300 nucleotides.     - 16S16S: Sequenced length of approximately 15501550 nucleotides.     - 5S5S: Sequenced length of approximately 120120 nucleotides.

  • The 16S16S gene is the standard for bacterial taxonomic classification because it is rapidly and easily sequenced while providing sufficient phylogenetic information.

  • Structure of the 16S16S rRNArRNA gene:     - It consists of 88 highly conserved regions.     - It contains 99 hypervariable regions across the bacterial domain.     - Conservation levels vary: more conserved regions correlate to higher-level taxonomy, while less conserved (variable) regions correlate to lower levels such as genus and species.

  • Taxonomic Identification:     - Sequence similarity in the 16S16S rRNArRNA gene is the gold standard for species-level identification.     - A sequence divergence range of 0.5%0.5\% to 1%1\% is typically used to delineate the species taxonomic rank.

Advantages of 16S16S rRNArRNA sequencing

  • The 16S16S rRNArRNA gene is universally distributed among all bacteria.

  • The abundance of available 16S16S rRNArRNA sequences significantly exceeds that of any other bacterial genes, facilitating easier comparison and analysis.

  • It provides a reliable metric for measuring phylogenetic relationships across different taxa.

  • Horizontal gene transfer (HGT) is not considered a significant problem for this gene, ensuring the phylogenetic signal remains linked to the organism's lineage.

  • The costs associated with performing 16S16S gene amplification and sequencing are currently very affordable.

Disadvantages and Limitations of 16S16S rRNArRNA Sequencing

  • Copy numbers per genome can vary; while usually taxon-specific, variation among different strains of the same species is possible.

  • Polymerase Chain Reaction (PCR) amplification biases can occur during library preparation.

  • Gene diversity within a sample tends to over-inflate overall diversity estimates.

  • Resolution is often too low to differentiate between very closely related species.

  • Evolution of the field: As sequencing costs continue to drop, microbiome research is shifting away from 16S16S sequencing toward more comprehensive functional representations via whole-genome or shotgun metagenomics sequencing.

Workflow for 16S16S rRNArRNA Sequencing

  • A complete workflow typically includes four main stages:     - DNA isolation.     - Library preparation.     - Sequencing.     - Data analysis.

  • Following DNA isolation, the DNA is selectively amplified using PCR with primers specifically targeting the 16S16S rRNArRNA gene.

Sequencing Platforms and Primer Selection

  • Next-Generation Sequencing (NGS) constraints:     - Common NGS platforms usually cover 100100 to 600600 base pairs (bp) per single read.     - Because the full-length 16S16S rRNArRNA gene is approximately 15001500 bp, primers are often chosen to target only a portion of the gene.

  • Full-Length Sequencing:     - The full-length gene is usually amplified using the primer pair 27F and 1492R.     - Full-length sequencing is followed by either Sanger DNA sequencing or Pacific Biosciences (PacBio) SMRT sequencing.

  • High-Throughput Sequencing:     - Various high-throughput platforms sequence different lengths of DNA, requiring a suitable pair of PCR primers for each specific system.     - Region V1V1 to V3V3: Identified as the most useful for distinguishing species within the clinically important and ubiquitous skin bacterial genus Staphylococcus. Consequently, this region is standard for skin microbiome studies.     - Illumina MiSeq: When using this platform, the V3V3 and V4V4 regions are commonly amplified using limited-cycle PCR.     - Other technologies used include 454454 pyrosequencing (targeting specific regions or linked to barcodes) and large-scale clonal Sanger sequencing.

Library Preparation and Data Analysis

  • Library Preparation Steps:     - PCR products are purified, quantified, and pooled.     - Illumina sequencing adapters and dual-index barcodes are added to the amplicon targets.     - Using the full complement of Nextera XT indices, up to 9696 libraries can be pooled together for a single sequencing run.

  • Data Analysis Steps:     - Raw sequences are filtered and trimmed to maintain high quality.     - High-quality sequences are clustered into Operational Taxonomic Units (OTUs).     - OTU clustering is commonly based on a 97%97\% identity threshold of the reads.     - Determining OTUs allows for subsequent species annotation, OTU phylogeny, diversity analysis, and other downstream comparative studies.