Control of Gene Expression in Bacteria and Eukaryotes

Overview of Gene Expression Control in Bacteria

The Operon and its Functional Discovery

  • Definition of an Operon: An operon is defined as a cluster of structural genes that are transcribed together as a single unit, sharing a common promoter. This organization allows the cell to produce a single mRNAmRNA molecule containing the coding regions for several related proteins. Usually, these proteins function within the same biochemical pathway.

    • Historical Context: The operon model was famously proposed by François Jacob and Jacques Monod in 19611961. Their discovery was motivated by Jacob’s research on bacteriophage λ\lambda and Monod’s work on β\beta-galactosidase in EscherichiacoliEscherichia\,coli. Along with André Lwoff, they received the Nobel Prize in Physiology or Medicine in 19651965.

    • The Evolutionary Purpose of Operons: Computational biologists Oleg Igoshin and Christian Ray utilized mathematical models to investigate why operons exist in prokaryotes but are rare in eukaryotes. They hypothesized that the operon structure reduces ‐‐biochemical noise‐‐ (random fluctuations in transcription and translation levels). By coordinating the expression of multiple genes, the cell maintains optimal proportions of necessary proteins, which is critical for survival. They speculated that eukaryotic cells do not rely on operons because their larger volume buffers the effects of random molecular fluctuations.

Levels of Gene Regulation

Gene expression is a multi-step process from genotype to phenotype, and regulation can occur at various stages:

  1. Alteration of DNA or Chromatin Structure: Primarily in eukaryotes, modifications such as DNA methylation or changes in chromatin packaging determine DNA accessibility for transcription.

  2. Transcription: This is the primary level of control in bacteria. It is energetically efficient to stop protein synthesis early in the process.

  3. mRNAmRNA Processing: In eukaryotes, modifications like the 55' cap, 33' poly(A) tail, and intron splicing regulate translation and stability. This is largely absent in prokaryotes.

  4. mRNAmRNA Stability: The concentration of a protein is influenced by the rate of mRNAmRNA degradation. Rapid degradation results in lower protein levels.

  5. Translation: Factors such as the availability of amino acids, initiation factors, and enzymes influence the rate of protein synthesis from mRNAmRNA.

  6. Post-translational Modification: Proteins may be cleaved, trimmed, or chemically modified (e.g., phosphorylation) to become active or degrade.

DNA-Binding Proteins and Motifs

Much of gene regulation is mediated by proteins that physically bind to DNA sequences.

  • Domains: These are functional regions within regulatory proteins, typically 6060 to 9090 amino acids in length. Only a small subset of amino acids (including asparagine, glutamine, glycine, lysine, and arginine) make actual contact with the DNA.

  • Binding Nature: Most DNA-binding proteins bind dynamically to the major groove of the DNA double helix or the sugar-phosphate backbone through hydrogen bonds. This allows other molecules to compete for binding sites.

  • Characteristic Motifs:

    • Helix-turn-helix: Common in bacterial regulatory proteins; consists of two alpha helices connected by a turn.

    • Zinc-finger: Found in eukaryotic proteins; consists of a loop of amino acids with a zinc ion at the base.

    • Leucine zipper: Features a helix of leucine and a basic arm where two leucines interdigitate; common in eukaryotic transcription factors.

    • Helix-loop-helix: Found in eukaryotic proteins; two alpha helices separated by an amino acid loop.

    • Homeodomain: Three alpha helices; found in eukaryotic regulatory proteins.

The Logic of Operon Systems

Basic Structural Components

  • Structural Genes: Encode proteins involved in metabolism, biosynthesis, or cell structure.

  • Regulatory Genes: Separate sequences with their own promoters that encode regulatory proteins (activators or repressors). These proteins bind to the operator to control transcription.

  • Operator: A DNA sequence that overlaps the 33' end of the promoter (and sometimes the 55' end of the first structural gene). It serves as the binding site for the regulator protein.

  • Constitutive Genes: Genes that are expressed continually and are not regulated; they often encode essential ‐‐housekeeping‐‐ functions.

Four General Types of Transcriptional Control

  • Negative Inducible: The regulator is an active repressor that binds the operator and blocks transcription. An inducer must bind to the repressor to inactivate it, allowing transcription. This is common for degradative (catabolic) pathways.

  • Negative Repressible: The regulator is an inactive repressor. To block transcription, a corepressor must bind to the repressor to activate it. This is common for biosynthetic (anabolic) pathways where production should stop if the product is already abundant.

  • Positive Inducible: Transcription is normally off because the activator is inactive. It must be activated by an inducer to stimulate transcription.

  • Positive Repressible: The activator is normally bound to DNA, stimulating transcription. It must be inactivated by a molecule to stop transcription.

The lac Operon of Escherichia coli

Lactose Metabolism

  • lacZlacZ (\beta-galactosidase): This enzyme breaks lactose into glucose and galactose. It also converts lactose into allolactose.

  • lacYlacY (Permease): This protein transports lactose into the bacterial cell.

  • lacAlacA (Thiogalactoside transacetylase): Its precise role is unclear, but it may detoxify transport-related molecules.

Regulation Mechanism

  • Coordinate Induction: The simultaneous synthesis of lacZlacZ, lacYlacY, and lacAlacA stimulated by an inducer.

  • Negative Inducible Control: In the absence of lactose, the lacIlacI repressor (a tetramer of four polypeptides) binds to the operator (lacOlacO) and blocks RNA polymerase.

  • Induction by Allolactose: When lactose enters the cell, it is converted to allolactose by a few existing β\beta-galactosidase molecules. Allolactose acts as the inducer, binding to the repressor, causing an allosteric change that makes it release the operator.

lac Mutations and Partial Diploids

Jacob and Monod used partial diploids (created by conjugation with an F-plasmid) to determine how mutations affect the system.

  • Structural-Gene Mutations (lacZlacZ^-, lacYlacY^-): Affect individual protein function but not the whole operon's regulation.

  • Regulator-Gene Mutations (lacIlacI^-): Constitutive mutations where the repressor is nonfunctional. These are trans-acting, meaning a functional lacI+lacI^+ on a plasmid can regulate an operon on the bacterial chromosome.

  • Super-repressors (lacIslacI^s): These mutations prevent the inducer from binding to the repressor. They are dominant and keep the operon off even in the presence of lactose.

  • Operator Mutations (lacOclacO^c): These are cis-acting. They alter the operator DNA so the repressor cannot bind. They only affect genes on the same DNA molecule.

  • Promoter Mutations (lacPlacP^-): Prevent RNA polymerase from binding; they are cis-acting and result in no transcription.

Positive Control and Catabolite Repression

  • Mechanism: Bacteria prefer glucose over lactose. If both are present, the laclac operon is repressed through catabolite repression.

  • CAP and cAMP: The Catabolite Activator Protein (CAP) must bind to the promoter to facilitate RNA polymerase binding. However, CAP can only bind if it is complexed with 3,53',5'-cyclic adenosine monophosphate (cAMPcAMP).

  • Inversion Relationship: The concentration of cAMPcAMP is inversely proportional to the level of glucose.

    • High Glucose: Low cAMPcAMP, no cAMPCAPcAMP-CAP complex, low laclac transcription.

    • Low Glucose: High cAMPcAMP, active cAMPCAPcAMP-CAP complex, high laclac transcription (provided lactose is present).

The trp Operon

  • Classification: Negative repressible operon.

  • Structure: Consists of five structural genes (trpEtrpE, trpDtrpD, trpCtrpC, trpBtrpB, and trpAtrpA) that encode enzymes to convert chorismate into tryptophan.

  • Regulator (trpRtrpR): Encodes an inactive repressor.

  • Transcription Lock: When tryptophan levels are high, tryptophan behaves as a corepressor, binding and activating the repressor. The active repressor binds the operator and shuts down synthesis.

Control of Gene Expression in Eukaryotes

Human and Chimpanzee Divergence

  • Genomic Similarity: Humans and chimpanzees share 96%96\% of their DNA (1%1\% difference in base pairs, 3%3\% in indels).

  • Regulatory Evolution: King and Wilson (19751975) proposed that phenotypic differences between the species are due to changes in regulatory sequences rather than structural genes.

  • Transcription Factors: Katja Nowick identified 9090 transcription factors (including KRAB-ZFP proteins) that are expressed differently in human vs. chimpanzee brains, influencing complex cognitive traits.

Chromatin Structure Changes

  • DNase I Hypersensitivity: Active genes are more sensitive to DNase I because their chromatin is relaxed. These sites are often 10001000 bp upstream of the start site.

  • Chromatin Remodeling Complexes: Complexes like SWISNFSWI-SNF use hydrolyzed ATP to reposition nucleosomes, exposing DNA for transcription factors.

  • Histone Modification (The Histone Code):

    • Methylation: Histone methyltransferases add CH3CH_3. H3K4me3H3K4me3 is a marker for active promoters.

    • Acetylation: Acetyltransferase enzymes add CH3COCH_3CO groups to lysine residues on histone tails, neutralizing their positive charge and weakening their grip on DNA. This generally stimulates transcription. Deacetylases remove these groups to repress transcription.

    • Case Study: Arabidopsis Flowering: The FLCFLC gene suppresses flowering. The FLDFLD gene encodes a deacetylase that removes acetyl groups from the FLCFLC chromatin, shutting it off and allowing the plant to flower after cold exposure (vernalization).

  • DNA Methylation: Methylation of cytosine (forming 55-methylcytosine) at CpGCpG islands near promoters is associated with long-term gene silencing.

Eukaryotic Transcriptional Regulation

  • Basal Transcription Apparatus: Comprises RNA polymerase, general transcription factors, and the Mediator complex; binds the core promoter.

  • Enhancers and Silencers: Regulatory elements that can act at great distances (e.g., 69000bp69000\,bp). Enhancers loop modern DNA to contact the promoter. They can be transcribed into eRNAseRNAs.

  • Insulators: Boundary elements that block the action of an enhancer on a specific promoter if the insulator is placed between them. They help organize Topologically Associated Domains (TADs) using proteins like CTCFCTCF and cohesin.

  • Transcriptional Stalling: RNA polymerase may initiate transcription but pause after 2424 to 5050 nucleotides. Factors like NELFNELF promote stalling, while PTEFbP-TEFb relieves it via phosphorylation.

  • Coordinated Regulation: Unlike the operons of bacteria, eukaryotic genes use shared response elements (e.g., MREMRE, GREGRE, TRETRE) in their promoters to respond to the same stimulus simultaneously.

RNA-Level Regulation and Interference

RNA Processing and Stability

  • Alternative Splicing: Allows a single pre-mRNAmRNA to produce different proteins. In DrosophilaDrosophila sex determination, the Sxl\text{Sxl} protein regulates the splicing of tra\text{tra} pre-mRNAmRNA, leading to female-specific development. In males, the absence of Sxl\text{Sxl} leads to a nonfunctional Tra\text{Tra} protein and male development.

  • mRNAmRNA Degradation: Takes place in PP bodies. The shortened poly(A) tail causes the removal of the 55' cap, and nucleases degrade the strand.

RNA Interference (RNAi)

  • Trigger: Double-stranded RNA is cleaved by Dicer into 2121-2525 nucleotide fragments (siRNAssiRNAs or miRNAsmiRNAs).

  • Complexes: These bind proteins to form the RNA-induced silencing complex (RISC) or RNA-induced transcriptional silencing complex (RITS).

  • Mechanisms:

    1. Cleavage of mRNAmRNA: Slicer protein cleaves mRNAmRNA complementary to siRNAsiRNA.

    2. Inhibition of Translation: miRNAsmiRNAs often block protein synthesis without degrading the mRNAmRNA.

    3. Transcriptional Silencing: RITSRITS attracts methylating enzymes to the chromatin.

    4. Decay Triggering: Some miRNAsmiRNAs bind to AU-rich elements in the 33' UTR to trigger Slicer-independent degradation.

  • RNA Crosstalk: Competing endogenous RNAs (ceRNAsceRNAs) like long noncoding RNAs (lncRNAslncRNAs) or circular RNAs act as ‐‐molecular decoys‐‐ that soak up miRNAsmiRNAs, preventing them from silencing target mRNAsmRNAs.

Translational and Post-translational Control

  • Initiation Factor Availability: In T lymphocytes, activation by an antigen increases the availability of initiation factors, boosting protein synthesis 77- to 1010-fold without increasing transcription.

  • Post-translational Processing: Selective cleavage, trimming, and modifications (acetylation, phosphorylation, ubiquitination) control the final function and stability of proteins.

Questions & Discussion

  • Physical Proximity in Research: Jacob and Monod collaborated due to working on the same floor. In the modern era, while digital tools exist, physical proximity remains valuable for spontaneous inspiration and brainstorming (though digital collaboration allows global reach).

  • Constitutive Genes: These are genes expressed at a constant rate regardless of the environment, encoding essential enzymes.

  • Difference between Structural and Regulatory Genes: Structural genes encode metabolic or structural proteins; regulator genes produce factors (RNA or protein) that control the activity of other genes.

  • Effect of High Glucose on the lac Operon: High glucose leads to low cAMPcAMP, preventing CAPCAP from binding to the promoter. Even if lactose is present, transcription is low because RNA polymerase cannot bind efficiently without cAMPCAPcAMP-CAP.

  • The Difference between Humans and Chimps in Base Pairs: With a human genome of 3.2×1093.2 \times 10^9 base pairs, a 4%4\% difference (1%1\% SNPs + 3%3\% indels) equates to 1.28×1081.28 \times 10^8 base pair differences.