Transcription and Gene Expression Mechanism and the Mechanisms of Gene Expression
Overview of Gene Expression and Central Dogma
- Gene expression is the biological process of converting genetic information stored in DNA into functional proteins.
- This process is divided into two primary stages:
- Transcription: The conversion of DNA into messenger RNA (mRNA). This is the focus of Chapter 3.
- Translation: The conversion of mRNA into a chain of amino acids, which ultimately forms a protein. This is the focus of Chapter 4.
- The cell is the basic unit of living tissue. Within human cells, the nucleus contains the genome, split among 23 pairs of chromosomes.
- Each chromosome consists of a long strand of DNA tightly packaged around proteins known as histones. Specific sections of this DNA are called genes, which contain instructions for making proteins.
The Process of Transcription
- Transcription is the first stage of gene expression.
- When a gene is "switched on," an enzyme called RNA polymerase attaches to the start of the gene.
- RNA polymerase moves along the DNA, unzipping the double helix and reading one of the two strands (the template strand).
- It uses free bases in the nucleus to assemble a strand of mRNA.
- The DNA code determines the order of the bases added to the mRNA. In RNA, the base Thymine (T) is replaced by Uracil (U), which is a close chemical cousin.
- Transcription Unit: A region of DNA extending from the promoter to the terminator.
- Coding Strand vs. Template Strand:
- Template Strand: The strand used by RNA polymerase to make the complementary mRNA.
- Coding Strand: The DNA strand whose sequence matches the resulting mRNA (with U replacing T).
Experimental Confirmation of messenger RNA
- Scientists sought to determine how information moved from the nucleus (DNA site) to the cytoplasm (protein synthesis site).
- An experiment was performed using E.coli bacteria:
- Bacteria were grown for several generations on media containing heavy isotopes: Nitrogen-15 (15N) and Carbon-13 (13C).
- All cellular components, including ribosomes, became "heavy labeled."
- The bacteria were then infected with the T2 bacteriophage virus.
- The phage destroyed the bacterial DNA and substituted its own genetic information to direct viral protein synthesis.
- Hypothesis: If ribosomes carried specific gene information, new virus-specific ribosomes would need to be made. If ribosomes were passive sites, old bacterial ribosomes could be used to make viral proteins.
- Results: Radio-labeled phage RNA was found associated with all the old bacterial ribosomes. This confirmed that ribosomes do not carry the genetic information themselves; rather, an unstable class of RNA (messenger RNA) carries the code from DNA to the ribosome.
Molecular Components of Transcription
- Transcription requires an enzyme (RNA polymerase) to catalyze the formation of phosphodiester bonds.
- Chemistry of Synthesis:
- The hydroxyl (OH) group attacks the alpha phosphate of an incoming nucleotide, removing a pyrophosphate.
- Requirements: A DNA template, and the nucleotides ATP, CTP, GTP, and UTP.
- Direction: The RNA chain grows in the 5′ to 3′ direction.
- Unlike DNA polymerase, RNA polymerase does not require an RNA primer to initiate synthesis.
- Prokaryotic RNA Polymerase Subunits:
- Core Enzyme: Composed of two α subunits, one β subunit, one β′ subunit, and one ω subunit.
- Holoenzyme: The core enzyme plus the σ (sigma) factor. Only the holoenzyme can initiate transcription.
- α Subunits: Encoded by RPOA; helps recognize the UP element in strong promoters.
- β and β′ Subunits: Encoded by RPOB and RPOC; involved in DNA binding and phosphodiester bond formation.
- σ Factor: Encoded by RPOD; directs the enzyme to the promoter and provides specificity. It dissociates from the core enzyme after initiation and can be recycled.
- Nucleotide Numbering:
- The first nucleotide at the transcription start site is numbered +1.
- Downstream sequences move in the positive direction (+2, +3, etc.).
- Upstream sequences (before the start site) are numbered negatively (−1, −2, etc.).
- Prokaryotic Promoters:
- Consists of two critical consensus sequences: the −10 TATA sequence and the −35 sequence.
- Strong promoters may include an "UP element" located between −40 and −60.
- Eukaryotic Promoters:
- Primarily involves RNA Polymerase II for mRNA.
- A key consensus sequence is the TATA box located at approximately −25.
Stages of Transcription: Initiation, Elongation, and Termination
- Initiation:
- The σ factor directs the holoenzyme to the promoter to form a "closed promoter complex."
- The holoenzyme melts a short region of DNA (10 to 17 base pairs) to create an "open promoter complex," also called a transcription bubble.
- The first base in the RNA is typically a purine (A or G).
- Elongation:
- Once the first few phosphodiester bonds are formed, the σ factor dissociates.
- The core enzyme continues adding nucleotides sequentially.
- The transcription bubble consists of approximately 14 unpaired bases; the first 9 are used to transcribe new RNA.
- Physical Topology: Opening the helix introduces positive supercoils ahead of the polymerase and negative supercoils behind it. Topoisomerase is required to unwind positive supercoils.
- RNA polymerase may pause or backtrack to proofread the newly synthesized RNA.
- Transcription Bubble Detection: Dimethyl sulfate (DMS) transfers a methyl group to Adenine in open regions. S1 nuclease then cleaves the un-base-paired DNA, which can be visualized on a gel to identify the bubble's location (e.g., between −9 and +3).
- Termination:
- Intrinsic (Rho-independent) Terminator: Involves a GC-rich inverted repeat that forms a hairpin loop, followed by a string of 7 to 9 Uracil (U) bases. This structure causes the polymerase to fall off.
- Rho-dependent Terminator: Lacks the poly-U string. A protein factor called Rho follows the RNA polymerase and catches it when it pauses at a hairpin, unwinding the RNA-DNA hybrid to release the transcript.
Regulation of Bacterial Transcription
- Gene expression is energetically expensive, so cells regulate which genes are active. E.coli has more than 3,000 genes, but not all are transcribed simultaneously.
- Strategies for Regulation:
- Alternative Sigma Factors: Viruses like SpO1 use specific proteins (gp28, gp33, gp34) to redirect host RNA polymerase to viral genes. Similarly, E.coli uses σ32 for the heat shock response.
- Anti-sigma Factors: The RSD protein binds to σ70 during environmental stress, blocking its activity.
- RNA Polymerase Switching: Bacteriophage T7 encodes its own RNA polymerase to transcribe its late-phase genes.
- Anti-termination: Proteins like the N and Q proteins in lambda phage prevent premature termination, allowing read-through into subsequent genes.
- Operons: Groups of contiguous, coordinate-controlled genes transcribed as a single "polycistronic" message.
- Transcription Attenuation: Premature termination based on leader sequences (e.g., the trp operon).
- Riboswitches: RNA regions in the 5′ untranslated region (UTR) that change conformation upon binding a ligand (e.g., FMN binding in the ribD operon).
The Operon Model and Positive/Negative Control
- Negative Control (Induction): A repressor binds a promoter to inhibit transcription until an inducer is present (e.g., Lac operon).
- Negative Control (Repression): An inactive repressor becomes active only in the presence of a co-repressor (e.g., Tryptophan operon).
- Positive Control: An activator (like the CAP-cAMP complex) must bind to the promoter for full transcription activity.
- The Lac Operon:
- In the absence of lactose, the lacI gene produces a repressor that binds the operator, blocking transcription.
- In the presence of lactose, the inducer binds the repressor, causing it to release the operator.
- Catabolite Repression: If glucose is present, cAMP levels are low. If glucose is absent, cAMP levels rise, forming the CAP-cAMP complex, which acts as an enhancer for the lac operon.
- The lac operon is fully active only when glucose is absent and lactose is present.
Eukaryotic Transcription Machinery
- Eukaryotes use three distinct RNA polymerases:
- RNA Polymerase I: Located in the nucleolus; synthesizes large ribosomal RNA (rRNA) precursors (28S, 18S, 5.8S).
- RNA Polymerase II: Located in the nucleoplasm; synthesizes mRNA precursors.
- RNA Polymerase III: Located in the nucleoplasm; synthesizes tRNA precursors, 5S rRNA, and small nuclear RNAs.
- Transcription Factors (TF):
- General Transcription Factors (GTFs): Required for basal level transcription. For Pol II, the assembly sequence is: TF2D (containing TATA-binding protein or TBP) + TF2A \u2192 TF2B \u2192 TF2F/Polymerase II \u2192 TF2E \u2192 TF2H.
- Specific Transcription Factors (Activators): Bind to enhancers and contain DNA-binding motifs.
- DNA-Binding Motifs:
- Zinc-containing modules: e.g., Zinc finger (coordinated by two Cys and two His) or the GAL4 bimetal cluster (six Cys at a zinc ion).
- Homeodomain: Contains 60 amino acids and a helix-turn-helix motif.
- BZIP and BHLH: Dimerize via a leucine zipper.
Epigenetic Regulation in Eukaryotes
- DNA is packaged into nucleosomes: 147 base pairs of DNA wrapped in 1.75 turns around an octamer of core histones (H2A, H2B, H3, and H4).
- Histone Modification: Covalent changes (acetylation, methylation, phosphorylation) to positively charged histone tails alter their interaction with negatively charged DNA, making DNA more accessible to RNA polymerase.
- Chromatin Remodeling: Complexes use energy to move nucleosomes so RNA polymerase can access genes.
- Overcoming the Nucleosomal Barrier: During elongation, polymerase moves through nucleosomes via nucleosome mobilization (octamer transfer) or H2A/H2B dimer depletion.
- Heterochromatin vs. Euchromatin:
- Euchromatin: Open, accessible, and transcriptionally active.
- Heterochromatin: Tightly bundled, repressive, and methylated (often at CpG islands).
Post-Transcriptional mRNA Processing
- Eukaryotic genes contain exons (coding) and introns (intervening sequences).
- Three primary modifications occur after transcription:
- Five' Cap (5′ cap):
- Step 1: RNA triphosphatase removes one phosphate.
- Step 2: RNA guanine transferase adds a GMP molecule.
- Step 3: Guanine N7 methyltransferase adds a methyl group to the added guanine.
- Function: Protects RNA and serves as a binding site for ribosomes.
- Polyadenylation (3′ tail):
- Cleavage occurs after a CA dinucleotide between the AAUAAA hexamer and a downstream GU-rich region.
- Poly A polymerase adds approximately 200 Adenosine (A) residues to the site.
- Splicing:
- Carried out by the spliceosome (comprising snRNPs U1, U2, U4, U5, and U6).
- Introns typically start with GU (at the 5′ splice site) and end with AG (at the 3′ splice site).
- Mechanism: Two-step transesterification forms a lariat structure from the intron, which is then released and degraded.
- Alternative splicing allows a single gene to produce multiple protein variants.
Non-Coding RNAs and RNA Interference (RNAi)
- Approximately 98% of the transcriptional output of the human genome is non-coding RNA.
- Small Non-coding RNAs:
- microRNA (miRNA): Endogenous; cut from larger hairpin precursors by the enzyme Dicer into approximately 22 nucleotides. They often bind imperfectly to targets to inhibit translation.
- small interfering RNA (siRNA): From exogenous sources (viruses or synthetic). Cut by Dicer into 21 to 25 nucleotides. They match targets perfectly, leading to mRNA cleavage.
- Mechanism of Action:
- Dicer cuts the RNA, which is then passed to the RNA-induced silencing complex (RISC).
- The Argonaut protein in the RISC complex guides the antisense strand to the target mRNA.
- RITS (RNA-induced initiation of transcriptional silencing) complexes can also recruit chromatin remodeling enzymes to methylate DNA and histones, resulting in transcriptional silencing.