Microbial Genetics: The Selfish Gene

Introduction to Microbial Genetics

This section provides a brief refresher on microbial genetics, focusing on the "selfish gene" concept.

What is Genetics?

Genetics is primarily defined as the study of how hereditary material (DNA and RNA) is transferred.
This definition is functionally incomplete, as the field also encompasses:
- How genes are expressed.
- When genes are expressed.
- Why genes are expressed (their function).

Genome Subdivision

Genome: All of the genetic material in a cell.
- The genome is subdivided into chromosomes and plasmids.
- Chromosomes: Subdivided into genes (also known as coding regions) and non-coding regions.

DNA Structure

Composition: DNA is composed of repeating nucleotides, attached to a sugar phosphate backbone.
- There are 4 nucleotides: Adenine (A), Thymine (T), Cytosine (C), and Guanine (G).
- RNA uses Uracil (U) instead of Thymine (T).
Form: DNA generally exists as two strands bound together, twisted into a helix.
Base Pairing: A always binds with T, and C always binds with G.
Orientation: DNA strands have a 5' end (phosphate group) and a 3' end (hydroxyl group).

DNA Packaging

Most bacteria possess a single circular chromosome, while eukaryotes typically have multiple linear chromosomes.
Comparison: If linearized, the E. coli chromosome is over 1mm in length, which is many times larger than the cell (approximately 2 ext{uM} in length).
Solution: To fit within the cell, DNA must be extensively coiled, or super-coiled.
- This supercoiling is a common theme in nature.
Human Genome Comparison: The human genome is about 3 billion nucleotides and would stretch roughly 1 meter in length if linearized.

DNA Replication

Functions of DNA Copying

DNA is copied for two primary functions:
1. To encode for items the cell needs (e.g., proteins).
2. To replicate for cell division.
In DNA replication, one double-stranded parent DNA molecule is converted into two identical double-stranded daughter strands.
This is possible because each strand of DNA complements the other; knowing one base allows determination of its partner.

Semiconservative Replication

DNA replication is semiconservative.
- The parent strands are separated.
- Each parent strand serves as a template for the synthesis of a new DNA strand.
- Consequently, each new DNA molecule contains half "old" (parental) DNA and half "new" DNA.

Steps of DNA Replication

Uncoiling: DNA is uncoiled by two enzymes, topoisomerase and gyrase. This process starts at a region called the origin of replication.
Unwinding: The enzyme helicase unwinds the helix and separates the DNA strands in both directions.
Stabilization: Single-strand binding proteins (SSBs) stabilize the separated single-stranded DNA.
Primer Addition: Nucleotide addition requires a free 3' hydroxyl group. An enzyme called primase adds an RNA primer to the strand being replicated.
Nucleotide Binding: Free nucleotides bind to the single strands of DNA, complementing the parent strand (A with T, C with G).
Polymerization: DNA polymerase ensures the proper nucleotide is bound and connects it to the growing DNA strand.

The Lagging Strand

The RNA primer is complementary to the DNA template strand.
On one strand (the leading strand), there will be a free 3' hydroxyl group, allowing for continuous addition of nucleotides.
On the other strand (the lagging strand), there is no free 3' hydroxyl group in the direction of fork movement.
- Solution: Multiple RNA primers are added to the lagging strand.
- DNA is synthesized in short pieces, called Okazaki fragments.
- These fragments are then assembled together by DNA polymerase (replacing RNA primers with DNA) and DNA ligase (joining the fragments).

Finishing Replication

Bidirectional Replication: DNA replication is bidirectional, meaning the process described occurs in both directions around the circular chromosome.
Replication Forks: Two replication forks move around the chromosome until they meet halfway.
Separation: Upon meeting, they stop replicating, and the two newly formed circular DNA molecules are separated by topoisomerase.
Proofreading: Throughout the entire process, special proteins (which are essentially other DNA polymerases) check the DNA to ensure proper base pair matching.

Initiation of Replication (Bacterial Chromosome)

Replication of the circular bacterial chromosome begins at a fixed region called oriC.
Initiation proteins bind to the oriC DNA sequences.
Other necessary replication proteins then assemble at this site, forming a "replication factory."

Elongation of Replication

Replication factories are attached to the cell membrane.
They consist of various enzymes that unwind, separate, and synthesize complementary strands.
As replication proceeds, unreplicated parental DNA is pushed through the replication factory at a replication fork, synthesizing a complementary strand.
Due to the antiparallel nature of the parental DNA strands, both leading and lagging strand synthesis occur at each replication fork.

Termination of Replication

Replication concludes when the replication factories reach a terminus region opposite the oriC on the chromosome.
At the terminus, termination proteins bind, and the replication factories disperse.
Each daughter chromosome consists of one old parental strand and one newly synthesized strand.

The Central Dogma of Genetics

The genome not only replicates but also encodes all necessary proteins for cellular function.
The process by which DNA information is used to encode proteins is known as the central dogma of genetics:
- DNA
  ightarrow RNA
  ightarrow Protein
- DNA is transcribed into RNA, which is then translated into protein.

Transcription

Overview

Transcription is the first step in going from genes to proteins.
It involves copying a specific gene (not the entire genome) into messenger RNA (mRNA).
mRNA is necessary because protein production must be halted once sufficient protein is produced, and this is achieved by the destruction of the mRNA.

Similarities and Differences with DNA Replication

Similarities: The process is very similar to genome replication.
Exceptions:
- Only specific gene regions are copied (not the whole genome).
- RNA strands are produced, which do not permanently bind with the DNA; they exist as single strands.
- There is no proofreading mechanism during transcription.

Steps of Transcription

Initiation: RNA polymerase binds to the promoter (a regulatory region directly upstream of the gene) and initiates DNA unwinding.
Elongation (Part 1): RNA polymerase facilitates the assembly of free ribonucleotides into chains of RNA, using the DNA as a template. (A binds with U, T with A, C with G, G with C on the template strand).
Elongation (Part 2): The RNA polymerase continues along the gene, adding to the RNA chain. The RNA chain does not permanently pair with the DNA but exists as a single strand.
Termination Signal: RNA synthesis continues until the RNA polymerase encounters a specific region of DNA known as a terminator.
Termination & Release: Once the terminator is reached, the RNA polymerase stalls. A rho termination protein (in some cases) facilitates the dissociation of the RNA polymerase and the RNA strand from the DNA. The DNA then re-winds, and the RNA is free to proceed to the next step (translation).

Translation: From RNA to Protein

The Challenge of Nucleic Acids to Amino Acids

The challenge is to convert a sequence of 4 nucleic acid bases into a sequence of 20 amino acids.
The solution involves codons, another type of RNA molecule (tRNA), and ribosomes.

Codons

The string of mRNA is divided into groups of 3 bases called codons.
Each codon specifies a particular amino acid (e.g., CCC = proline, CAC = histidine).
Since there are 4 bases and 3 positions in a codon, there are 4 imes 4 imes 4 = 64 possible codons.

Degeneracy of the Genetic Code

64 possible codons are more than the 20 amino acids needed.
This results in multiple codons specifying the same amino acid, a phenomenon called degeneracy.
61 of the 64 codons encode amino acids; the other 3 are non-coding or stop codons.

Transfer RNA (tRNA)

tRNA molecules can bind both RNA and amino acids.
Each tRNA molecule has:
- A region called an anticodon, which can bind a specific mRNA codon.
- A region which can bind a specific amino acid. When an amino acid is bound, the tRNA is "charged."
There are approximately 61 different tRNA molecules, corresponding to the 61 sense codons (those encoding amino acids).

Ribosomes

Ribosomes are the structures where protein synthesis (translation) occurs.
Their structure facilitates protein formation.
Subunits: Ribosomes exist as two subunits:
- The 30S (small) subunit.
- The 50S (large) subunit.
- These come together to form a functional 70S ribosome.
- The "S" refers to Svedberg units, a measure of sedimentation rate during centrifugation.
Composition: Ribosomes are complex structures composed of dozens of proteins and ribosomal RNA (rRNA).
Functional Domains of the 50S (large) subunit:
- Aminoacyl site (A site): Site of entry for charged tRNAs.
- Peptidyl (P site): Site of peptide bond formation, holds the tRNA with the growing polypeptide chain.
- Exit (E site): Site of exit for uncharged tRNAs.
16S RNA: Located in the 30S (small) ribosomal subunit.
- Its sequence is complementary to a conserved sequence in the mRNA, located just upstream of the gene, called the ribosome binding site (also known as the Shine-Dalgarno sequence).
- Binding of the 16S RNA to the ribosome binding site on the mRNA aligns the ribosome so that translation starts at the correct location (the start codon).

Steps of Translation

Translation is divided into three main steps:

Initiation

The 30S (small) ribosomal subunit binds to the mRNA.
The ribosome is properly aligned on the mRNA by the binding of its 16S rRNA to the ribosome binding site (Shine-Dalgarno sequence).
This binding is assisted by initiation factors.
The first charged tRNA (almost always carrying N-formyl methionine in bacteria) binds to the start codon (typically AUG) on the mRNA within what will become the P site. This binding is mediated by the mRNA sequence and the complementary anticodon loop on the tRNA.
Initiation is completed when the 50S (large) ribosomal subunit binds to the 30S subunit, forming the functional 70S ribosome. The N-formyl methionine tRNA is now in the P site.

Elongation

The next charged tRNA (carrying the appropriate amino acid) binds to the vacant A site of the ribosome. Specificity is determined by the mRNA codon and tRNA anticodon.
Once the tRNA is bound in the A site, peptidyl transferase (an enzymatic activity of the large ribosomal subunit) catalyzes the formation of a peptide bond between the amino acid on the tRNA in the A site and the growing polypeptide chain attached to the tRNA in the P site.
The ribosome then moves forward along the mRNA by exactly one codon (translocation).
This shift moves the tRNA with the polypeptide chain from the A site to the P site.
The now uncharged tRNA from the P site is moved to the E site.
The uncharged tRNA in the E site is ejected from the ribosome complex.
The A site is now vacant, and the ribosome complex is ready to bind the next charged tRNA.
This process is repeated, adding amino acids one by one, until the entire protein is synthesized.

Termination

Termination begins when the ribosome encounters a stop codon (UAA, UAG, or UGA) in the A site.
There is no tRNA that corresponds to the stop codon, causing the ribosome to stall.
A release factor protein binds in the A site.
Binding of the release factor causes the polypeptide chain to be released from the tRNA in the P site via hydrolysis.
Finally, the ribosome destabilizes, and the mRNA, as well as the large and small ribosomal subunits, dissociate from each other.

Genotype vs. Phenotype

Genotype: The genetic makeup of an organism.
- Anything an organism can make or do must be encoded in its genotype.
Phenotype: The observable or measurable traits an organism has (e.g., the ability to ferment lactose).
Genotypes encode phenotypes, but having a gene does not guarantee that its corresponding phenotype will always be expressed.

Mutation

Overview

Mutation: A change in the genetic makeup (even single bases) of an organism.
Changes in genes can lead to changes in proteins, potentially impacting their function.
Due to the degenerate nature of the RNA coding, it is possible to have a genetic mutation without affecting the resulting amino acid sequence, and thus without affecting the phenotype.

Types of Mutation

Mutation typically occurs through the change of 1 base (either by changing the existing base, inserting an additional base, or deleting an existing base).

Missense Mutation

Occurs when a change in one nucleotide results in a change of 1 amino acid in a protein.
Example:
- Original DNA: CTT
  ightarrow mRNA: GAA
  ightarrow Amino acid: Glutamine
- Mutated DNA: CTA
  ightarrow mRNA: GAU
  ightarrow Amino acid: Aspartate

Nonsense Mutation

Occurs when a base change results in a nonsense (stop) codon, prematurely stopping protein synthesis.
Example:
- Original DNA: ATG
  ightarrow mRNA: UAC
  ightarrow Amino acid: Tyrosine
- Mutated DNA: ATT
  ightarrow mRNA: UAA
  ightarrow Amino acid: Stop

Frameshift Mutation

Occurs when one nucleotide is either added or subtracted from the gene.
This fundamentally changes the reading frame, altering all subsequent codons.
Example with words:
- Original: THE BIG BAD RED DOG
- Addition: THE GBI GBA DRE DDO G (inserting 'G' after 'E')
- Deletion: THE IGB ADR EDD OG (deleting 'B')
Example with codons:
- Original sequence: UUU UCU ACA CGA (Phe-Ser-Thr-Arg)
- Insert an additional base (e.g., G) after the first UUU:
- UUG UUC UAC ACG A (Leu-Phe-Tyr-Thr)
The amino acids are completely changed, and function will most likely be lost.

Summary of Mutation Types

Type of Mutation	Cause of Mutation	Functionality of Protein
Missense	Single base change (still coding)	Variable, depends on location
Nonsense	Single base change (non-coding = stop)	Normally non-functional
Frameshift	Addition or deletion of single base	Normally non-functional

Significance of Mutation

Mutations can rapidly change proteins, potentially rendering them non-functional.
However, sometimes a change can be beneficial.
Mutation is the driving force behind evolution; without it, life as we know it would not exist.

Beyond Mutation: Genetic Alteration

Genetic mutation is not the only way a bacterial genome can be altered.
Novel DNA can be incorporated into the cell through various mechanisms, often with problematic effects for human and animal hosts.

Genetic Inheritance (Vertical Transmission)

Vertical Transmission: The passing of genes from parent to offspring during replication.
In organisms that undergo binary fission, an exact copy of the genome is passed from parent to offspring.
Thus, the progeny are genetically identical clones of the parent.

Horizontal Gene Transfer (HGT)

Key Distinction

Vertical Transmission: Genes flow from parent to offspring during replication.
Horizontal Transmission: Genes flow from cell to cell independent of reproduction.

Overview

Genetic diversity is crucial for species protection against sudden changes leading to extinction.
Since bacteria reproduce by binary fission and are clones, they need alternative ways to diversify their genomes.
Horizontal Gene Transfer (HGT): The process by which bacteria swap genetic material between individuals, independent of reproduction (like sharing genes with classmates).
HGT can occur between both closely and distantly related organisms.

Mechanisms of HGT

There are three primary mechanisms for horizontal gene transfer:

Transformation

Definition: The passing of DNA from one bacteria to another in the form of naked DNA.
In environments with high bacterial concentrations, exogenous DNA will be present in high quantities.
Bacterial cells readily take up this DNA, though most is degraded and used for nutrients.
Recombination: Some of the external DNA is not degraded and can undergo recombination with the cell's genomic DNA.
- This can confer novel abilities to the cell, such as genes for toxins, antimicrobial resistance factors, or receptors for host colonization.
Griffith's Experiment (1920s): Frederick Griffith demonstrated this phenomenon using Streptococcus pneumoniae.
- He observed two forms: smooth (encapsulated, virulent) and rough (non-encapsulated, avirulent).
- Experiment results:
  - Living encapsulated bacteria
    ightarrow Mouse died.
  - Living non-encapsulated bacteria
    ightarrow Mouse healthy.
  - Heat-killed encapsulated bacteria
    ightarrow Mouse healthy.
  - Living non-encapsulated + heat-killed encapsulated bacteria
    ightarrow Mouse died (encapsulated bacteria isolated from mouse).
- This showed that a "transforming principle" (later identified as DNA) from the dead virulent strain could transform live avirulent strains into virulent ones.
Efficiency: Transformation is an important but often inefficient method of HGT (less than 1% of cells transform naturally).
Competence: Cells often need to be "primed" for transformation by expressing specific proteins before DNA is present. The ability of bacteria to take up extracellular DNA is called competence.

Conjugation

Definition: The ability of bacteria to transfer a copy of some or all of its genomic information directly to another bacteria, without exposing the genetic material to the environment.
Conjugation can transfer either plasmid DNA or chromosomal DNA.
Plasmids: Small, self-replicating, circular pieces of extra-chromosomal DNA.
- Can be transferred by both conjugation and transformation.
- Types of Plasmids (examples):
  - Conjugative plasmids: Carry genes necessary for conjugation itself.
  - Dissimilation plasmids: Carry genes for utilizing unusual sugars and carbohydrates.
  - Resistance plasmids (R-factors): Carry genes that encode antibiotic resistance.
  - Virulence plasmids: Carry genes that encode virulence traits.
- Plasmids can belong to multiple groups.
Plasmid Conjugation Process (F-factor example):
1. A F+ donor cell (carrying the F plasmid) produces a conjugation pilus.
2. The pilus binds to a F- recipient cell and retracts, bringing the cells close together.
3. A conjugation bridge forms between the two cells (a tunnel-like structure).
4. The F plasmid is replicated in the F+ cell by rolling circle replication. A single-stranded copy is immediately sent through the conjugation bridge to the F- recipient cell.
5. Once transferred, the complementary strand is synthesized in the recipient cell, resulting in both donor and recipient cells becoming F+.
Outcome: At the end of plasmid conjugation, both cells are F+ donor cells.
- These F+ cells can then go on to convert other F- cells, leading to rapid dissemination of plasmid-borne genes within a population.
Significance: Considered the major mechanism for antimicrobial resistance transfer among bacterial populations.

Transduction

Definition: Occurs when a bacteriophage (a virus that infects bacteria) accidentally transfers bacterial DNA into another host bacteria.
- It is accidental because bacteriophages normally only transfer their own DNA.
Process:
1. A bacteriophage infects a donor bacterial cell by injecting its DNA.
2. The phage uses the host cell machinery to replicate its own DNA and produce phage proteins.
3. To prevent bacterial interference, the bacterial chromosome is usually broken up into pieces.
4. During bacteriophage assembly, the capsid is built, and viral DNA is inserted. Occasionally, pieces of the fragmented bacterial chromosome are accidentally packaged into a bacteriophage capsid instead of viral DNA.
5. These new bacteriophages (some containing viral DNA, some containing bacterial DNA) are released from the lysed donor cell.
6. A bacteriophage containing bacterial DNA then infects a new recipient cell.
7. Since the injected DNA is bacterial (not viral), the recipient cell is not harmed by the phage.
8. The injected bacterial DNA may be able to recombine into the recipient cell's chromosome, transferring novel traits.
9. These newly acquired traits will then be transmitted vertically to future generations of bacteria during cell division.

Regulation of Bacterial Genes

Overview

Gene expression is tightly controlled, not just expressed whenever needed.
This control is primarily mediated by proteins.
Some genes are influenced by metabolites and typically encode regulatory proteins.
Other genes, usually encoding structural proteins, are under the control of repressors and inducers (which are themselves regulatory proteins).

Repressors and Inducers

Repressor Proteins: Any protein that decreases or prevents gene transcription.
- Typically does this by impeding RNA polymerase from binding or moving along the DNA.
Inducer Proteins: Any protein that promotes or increases gene transcription.
- Typically does this by removing repressors or enhancing RNA polymerase binding.

The Control Region

Repressors and inducers do not bind randomly but at a specific location called the control region.
The control region is located upstream of the coding gene and usually contains:
- A regulatory gene.
- A promoter region.
- An operator region.
Promoter region: Where RNA polymerase binds to the DNA to initiate transcription.
Operator region: Where repressors bind DNA to block transcription.
Regulatory gene: Located upstream of both the promoter and operator. It encodes the repressor or inducer proteins and is often under the control of metabolites.

Operons

It would be impractical for every gene to have its own control region, especially if multiple genes are needed for a particular function.
Operons solve this problem: they are clusters of several genes that serve one function, all under the control of a single regulatory region.

The Lac Operon

A classical example of multiple genes controlled by one regulatory region.
Structure: The lac operon contains:
- A regulatory gene (lacI).
- A promoter (P).
- An operator (O).
- 3 structural genes (lacZ, lacY, lacA) necessary for the utilization of lactose as an energy source (e.g., eta-galactosidase, permease, transacetylase).

Repression Mechanism (Glucose Present, Lactose Absent/Low)

The cell prefers to use glucose over lactose for energy.
When glucose is present (or lactose is absent), the lac operon is generally "off."
The regulatory gene (lacI) constitutively produces an active repressor protein.
This active repressor protein binds to the operator (O) region.
Binding of the repressor to the operator physically blocks RNA polymerase from transcribing the structural genes, thus preventing the expression of lactose-utilizing enzymes.

Induction Mechanism (Glucose Absent, Lactose Present)

When glucose stores are depleted, and lactose is present, the cell switches to lactose as an energy source.
Some lactose is converted into allolactose within the cell.
Allolactose acts as an inducer; it binds to the repressor protein.
Binding of allolactose causes a structural change in the repressor protein, inactivating it.
The inactivated repressor can no longer bind to the operator region.
With the operator free, RNA polymerase can bind to the promoter and transcribe the structural genes (lacZ, lacY, lacA).
This leads to the production of enzymes needed for lactose catabolism.

Positive Regulation (Catabolic Activator Protein - CAP)

In addition to induction, the lac operon can also be positively regulated.
When glucose levels in the cell are low, a molecule called cyclic AMP (cAMP) builds up.
cAMP binds to another protein called catabolic activator protein (CAP), activating it.
The activated CAP-cAMP complex binds to a specific site near the promoter region of the lac operon.
This binding enhances the affinity of RNA polymerase for the promoter, significantly increasing the rate of transcription of the lac operon genes.
- This ensures high expression of lac genes only when glucose is scarce and lactose is available.

Types of Operons

Inducible Operons: Generally "off" but can be turned "on" when needed (e.g., lac operon).
- Typically control catabolic genes (breaking down substances).
Repressible Operons: Generally "on" but can be turned "off" when not needed.
- Tend to control anabolic genes (building up substances).

Solitary Genes

Not every gene is part of an operon; some exist as solitary genes with their own control regions.
Key Point: All genes are under some form of control, which is vital for survival.
- Efficient bacteria do not express unneeded genes, conserving energy and outcompeting others.
Other methods of controlling protein production (at mRNA and ribosomal stages) exist but were not discussed in detail.