Bacterial Genomics and Horizontal Gene Transfer
Bacterial Genomes
Bacterial genomes are structured with both chromosomes and plasmids, each contributing uniquely to the genetic makeup of bacteria. Chromosomes are typically circular, though linear forms exist, and their size ranges from approximately 133 kilobases (kb) to 12 megabases (Mb). Plasmids, which also can be either circular or linear, generally vary from 1 kb to 2.5 Mb. A principal distinction is that plasmids do not usually carry essential genes critical for survival, but this definition is not absolute; some plasmids can carry genes vital under specific conditions.
Chromosomes are essential as they contain ribosomal RNA (rRNA) genes necessary for protein synthesis and have defined mechanisms for replication. The classification can become ambiguous when considering megaplasmids, which are larger than 100 kb and approach the size of minichromosomes. These elements, sometimes referred to as "chromids," replicate and are maintained similarly to plasmids but carry genes crucial for the bacterium's survival.
Genome Sizes
Genome sizes differ significantly across the domains of life, varying by seven to eight orders of magnitude. Prokaryotes show more uniformity in genome size compared to eukaryotes. However, considerable variation exists within groups such as animals, plants, protists, fungi, and diverse bacterial phyla.
The number of proteins that a genome encodes generally correlates with its size. This relationship is more consistent in prokaryotes than in eukaryotes. The coding capacity reflects the functional complexity and adaptability of the organism.
Factors Influencing Genome Size
Several selective pressures and random events influence the genome size of prokaryotes:
Genome Streamlining: Favors smaller genomes due to the selective advantage of faster replication. Smaller genomes require less energy and fewer resources to duplicate, providing a competitive edge.
Coding Potential: Selection for a greater number of genes leads to larger genomes, allowing the organism to adapt to more diverse environments and utilize a wider range of resources.
Chance: Deletion events occur more frequently than insertions, gradually reducing genome size over time unless counteracted by selection for coding potential.
Population Size: Larger populations can sustain larger genomes because they reduce the effects of genetic drift and allow for the maintenance of slightly deleterious genes that might provide a benefit under specific conditions.
Bacterial Genomics - Replication
Bacterial DNA replication exhibits these key characteristics:
Semi-conservative: Each new DNA molecule consists of one original strand and one newly synthesized strand.
Bidirectional: Replication proceeds in both directions from the origin of replication.
Origin (ori) to terminus (ter) direction: Replication starts at the origin and proceeds towards the terminus.
Asymmetrically replicating strands: Leading and lagging strands are synthesized differently, affecting gene placement and order.
These replication properties influence the arrangement of genes on the chromosome. The bacterial chromosome can be conceptualized as a dynamic collection of genes adapted to the cell's lifestyle.
Chromosome Architecture
Replication starts at the origin (Ori) and ends at the terminus (dif), defining the replication axis.
Highly expressed genes often cluster around the Ori, benefiting from the increased copy number near the replication start site.
There is a gene-strand bias, with more genes located on the leading strand to minimize conflicts between replication and transcription machineries.
Bacterial chromosomes are generally symmetric, facilitating efficient replication and segregation.
Functionally related genes are often co-transcribed in operons, allowing coordinated gene expression.
Some chromosomal regions are more prone to recombination, insertion, and deletion events, contributing to genome evolution.
DNA polymerase synthesizes approximately 1000 base pairs per second, whereas RNA polymerase operates about 10 times slower. Essential genes in rapidly growing bacteria are strategically located on the leading strand to take advantage of the faster replication speed, ensuring their timely expression.
Recombination events that prolong replication are selected against, as they slow down cell division and reduce fitness.
Evolution of Gene Order
The stability of gene order within a species is inversely correlated with the number of sequence repeats, which facilitate homologous recombination leading to genome rearrangements.
Gene Order in Prokaryotes
Gene order is generally poorly conserved across significant evolutionary distances but is well conserved over shorter distances, particularly within operons.
Conservation of gene order indicates common ancestry (homology) and is maintained by:
Neutralist perspective: Absence of repeats or recombination machinery.
Selectionist perspective: Constraints on gene organization, such as the requirement for co-transcription to optimize metabolic pathways.
Conserved gene order can be utilized to:
Determine the evolutionary history of duplicated genes, tracing their origin and divergence.
Infer the function of unknown genes by associating them with known genes in conserved operons.
Horizontal Gene Transfers (HGT)
Horizontal gene transfer (HGT) is the transmission of genetic material between organisms that are not in a parent-offspring relationship, facilitating rapid adaptation and evolution.
HGT Mechanisms
Common mechanisms of HGT include:
Conjugation: Direct transfer of DNA through physical contact via a pilus, often involving plasmids.
Transduction: Transfer of DNA mediated by bacteriophages, which can package host DNA and deliver it to other cells.
Transformation: Uptake of free DNA from the environment, which can then be incorporated into the recipient's genome.
Genomic Islands: Mobile genetic elements that carry genes that enhance the fitness of the host under specific conditions.
Less common mechanisms include:
Cell fusion
Gene transfer agents
Intracellular gene transfer
Genomic Islands (GIs)
Genomic islands are often carried by phages, plasmids, or integrative conjugative elements (ICEs), facilitating their mobility.
They typically exhibit different nucleotide content compared to the host genome, indicative of their foreign origin.
Genomic islands integrate at direct repeat sites, flanking the island and facilitating excision and transfer.
Genomic islands often increase the fitness of the host under specific conditions, such as antibiotic resistance or metabolic capabilities.
They can become immobilized in the host genome through inactivation of their transfer machinery.
Genomic islands contribute significantly to large DNA transfers between bacteria.
Genome Content - Core Genome
The core genome comprises genes shared among all or most members of a taxon, encoding essential functions and defining the species' fundamental characteristics. This genome is crucial for constructing species phylogenies.
Auxiliary Genome
The auxiliary or dispensable genome includes context-dependent genes involved in adaptation to local competition and environmental conditions. It contains mobile elements, genomic islands, and hypothetical proteins with no known homology or function.
The Pan-Genome Concept
The pan-genome is the entire set of genes present in all strains of a taxon, including both the core and auxiliary genomes. It provides a comprehensive view of the genetic diversity and adaptive potential of the species.
The pan-genome concept is essential for understanding the evolutionary history of a species, including the range of capabilities of different isolates and their relationships.
The Bacterial Pan-Genome
Extended Core: Universally conserved genes essential for basic cellular functions.
Character Genes: Genes involved in adaptation to a broad niche, providing a selective advantage in various environments.
Core Genome: Combination of extended core and character genes, representing the stable, essential genetic repertoire.
Accessory Genes: Genes present in some strains but not others, contributing to phenotypic diversity.
Unique Genes/ORFans: Genes with no known homologs or putative functions, representing novel adaptations or recently acquired genes.
In every newly sequenced bacterial genome, a significant number of genes remain with no known hits (“ORFans”), highlighting the vast unexplored genetic potential.
Example: Genome Evolution in Streptococcus agalactiae
S. agalactiae is a leading cause of severe infections in newborn infants and an emerging threat to the elderly. It has also been isolated from various animals, indicating its broad host range.
There are ten distinct capsular serotypes of Group B Streptococcus (GBS). The most frequent disease-causing isolates in the United States and Europe belong to five serotypes: Ia, Ib, II, III, and V.
The overall percent sequence identity between pairs ranges from 85% to 95%, reflecting a conserved core genome with variable regions contributing to serotype diversity.
Functions of shared and unique genes include essential housekeeping functions, cell envelope maintenance, regulatory functions, transport and binding proteins, mobile and extrachromosomal element functions, unknown functions, and hypothetical proteins.
Conclusions for S. agalactiae
The pan-genome is vast, with new genes identified with each newly sequenced strain, demonstrating ongoing adaptation and diversification.
The core genome represents only a small fraction of the pan-genome, emphasizing the significant genetic diversity within the species.
Classical typing of bacteria based on capsular polysaccharide composition does not fully reflect the genetic diversity of the species, underestimating the range of adaptive capabilities.
The structural motifs of all nine capsule serotypes are required for evasion of immune responses and are selected independently of other factors driving GBS diversity.
A universal vaccine is possible only by including dispensable genes (requiring a combination of 4 proteins/3 dispensable genes) to provide broad protection against diverse strains.
Horizontal gene transfer is not random; there are, on average, 2 hotspots of integration in the chromosome that facilitate these mechanisms. Prophages tend to integrate far from the origin of replication (ORI). These integration sites, being non-crucial for bacterial survival, allow for other elements to integrate once modified, leveraging the altered spot for further genetic incorporation.
GC Content and Mutational Patterns
GC-content
The ratio of guanine (G) and cytosine (C) in a bacterial genome is often uniform within a genome but varies significantly between genomes, influencing genome stability and gene expression.
Mutation pressures are influenced by the existing GC content, with genomes under selection to maintain or shift their GC ratios based on environmental conditions and evolutionary pressures.
Biased Gene Conversion
Biased gene conversion involves the non-reciprocal transfer of genetic information during recombination, where one allele is preferentially replaced by another.
Gene conversion occurs between:
Duplicated DNA within a genome, homogenizing paralogous sequences.
Different alleles in diploid organisms, influencing allele frequencies.
Homologous DNA from an external source, facilitating HGT and adaptation.
In many fungi and animals, there is a conversion bias towards GC, potentially increasing the GC content of the genome over time.
Biased gene conversion can alter nucleotide composition if recombination and heterozygosity are present, driving genomic evolution.
Biased Gene Conversion in Prokaryotes
Biased gene conversion is also present in prokaryotes. The effect can be as strong as in the human genome in certain bacteria, suggesting a significant role in genome evolution and adaptation.