Life and Modifications of Messenger RNA: A Comprehensive Study Guide

  • Post-transcriptional modifications are conceptualised as occurring after transcription but largely occur cotranscriptionally in reality.
  • RNA Polymerase II (Pol II) contains a Carboxy-Terminal Domain (CTD) with various phosphorylation states throughout elongation.
  • Proteins responsible for processing the RNA are associated with the CTD, allowing processing to occur in almost real-time as the RNA emerges from the polymerase.
  • By the time Pol II finishes transcription and disassembles, the RNA is typically fully processed.
  • The term "post-transcriptional" is maintained in literature for conceptual clarity and due to historical convention.

The Molecular Definition of a Gene

  • A gene is a genomic region (DNA or RNA) that codes for a functional product.
  • Historical perspective: In the mid-20th century, the concept was "one gene, one enzyme/protein."
  • Contemporary understanding:     - A single gene can encode multiple proteins through alternative processes.     - Many genes encode non-coding RNAs (ncRNAs) with specific functions.     - Functionality is the key criterion; if a product's function cannot be evaluated, categorising it as a gene is difficult.
  • Genes are often interrupted (discontinuous) and can overlap on the genome.
  • In a specific DNA region, multiple genes can exist: some coding for proteins, others for ncRNAs, overlapping or nested within one another.

RNA Structural Dynamics

  • RNA is not a simple linear string; it behaves like headphone cables or Christmas tree lights.
  • It tends to fold and coil spontaneously upon itself.
  • Structure influences protein interaction; regions emerging early from the polymerase are more accessible before high-order folding occurs.

The 5' Capping Process

  • Capping involves placing a "cap" or "lid" on the 55' end of the messenger RNA (mRNAmRNA) to protect it and facilitate downstream processes.
  • Mechanism: Three enzymatic reactions usually performed by proteins associated with the CTD of Pol II.     - 1. Trifosphatase: Removes one phosphate from the triphosphate group at the 55' end of the nascent RNA.     - 2. Guanylyltransferase: Adds a Guanine residue (GTPGTP) in an inverted orientation (555'-5' linkage).     - 3. Methyltransferase: Adds a methyl group to the Guanine at the Carbon 7 position (m7Gm^7G).
  • Biological Utility of the Cap:     - Prevents degradation by exonucleases (enzymes that degrade free ends of nucleic acids) via steric hindrance.     - Protects the Guanosine from glycosylases by methylating Carbon 7.
  • Variations:     - Cap 0: Standard cap found in most organisms (7methylguanosine7-methylguanosine).     - Cap 1, 2, 3, 4: Additional methylations on the first several nucleotides of the transcript, common in specific organisms.
  • Genomic Organisation: In metazoans and plants, nucleotidyltransferase and phosphatase activities are often on one polypeptide, while methyltransferase is on another.

The Cap Binding Complex (CBC)

  • The CBC consists of two primary proteins: CBP80CBP80 (80kDa80\,kDa) and CBP20CBP20 (20kDa20\,kDa).
  • The CBC binds to the 7methylguanosine7-methylguanosine structure.
  • It serves as a scaffold for factors involved in:     - Splicing.     - Transcription.     - Biogenesis of non-coding RNAs.     - Nuclear-cytoplasmic transport.     - First rounds of translation.     - Nonsense-Mediated Decay (NMDNMD).

Splicing: The Removal of Introns

  • Splicing is the process of removing non-coding regions (introns) and joining coding regions (exons).
  • Analogy: Film splicing, where a machine cuts out a segment of film and joins the remaining ends with tape.
  • Prevalence: Approximately 90%90\% of eukaryotic genes undergo splicing.
  • Gene Statistics:     - Average gene length:  5000~5000 base pairs (bp), with extremes from 10001000 to 10,000+bp10,000+\,bp.     - Exon length: Typically  500bp~500\,bp (ranging from 2525 to 2000+bp2000+\,bp).     - Intron length: Typically  2000bp~2000\,bp, but can extend to 10,000+bp10,000+\,bp.     - The majority of the transcribed genome consists of non-coding introns.

The Chemical and Structural Mechanism of Splicing

  • Recognition Sites:     - 55' Donor Site: Usually begins with the dinucleotide GUGU.     - 33' Acceptor Site: Usually ends with the dinucleotide AGAG, preceded by a polypyrimidine tract.     - Branch Point: An Adenine (AA) residue located approximately 2020 to 5050 nucleotides upstream of the 33' site.
  • Chemical Reaction:     - Splicing involves two transesterification reactions (ΔG0\Delta G \approx 0).     - First reaction: The 2-OH2'\text{-OH} of the branch point Adenine attacks the 55' splice site, breaking the 535'-3' phosphodiester bond and creating a 525'-2' bond.     - This forms a branched structure known as a "lariat" (cowboy lasso analogy).     - Second reaction: The freed 3-OH3'\text{-OH} of the upstream exon attacks the 55' end of the downstream exon, joining them and releasing the lariat.
  • Irreversibility: Although chemically reversible, the process is made one-way by mechanical rearrangements of the machinery that move the reaction ends away from each other after the attack.

The Spliceosome Machinery

  • Composed of Small Nuclear Ribonucleoproteins (snRNPs, pronounced "snurps").
  • BASAL MACHINERY:     - U1U1: Recognizes the 55' donor site via RNA-RNA base pairing.     - U2U2: Recognizes the branch point site.     - U2AFU2AF (U2 Associated Factor): Assists U2U2 in recognizing the 33' acceptor site.     - U4,U5,U6U4, U5, U6: Form a tri-snRNP complex that joins the assembly to catalyse the reaction.
  • Note: U3U3 is not involved in mRNA splicing despite its name.
  • snRNP Biogenesis: Specific RNAs are transcribed in the nucleus, exported to the cytoplasm for assembly with proteins (aided by chaperones), then imported back to the nucleus to Cajal bodies and finally to nuclear speckles.

Splicing Variations and Specialized Mechanisms

  • Major vs. Minor Spliceosome:     - Major (U2-type): The standard machinery (U1,U2,U4,U5,U6U1, U2, U4, U5, U6) recognizing GUAGGU-AG sites.     - Minor (U12-type): Recognizes different consensus sequences (ATACAT-AC) using different snRNPs (U11U11 for U1U1, U12U12 for U2U2, U4atacU4atac and U6atacU6atac for U4/U6U4/U6). U5U5 is shared.     - The minor spliceosome may regulate processing rates for complex structural folding.
  • Trans-splicing:     - Occurs in organisms like Trypanosomatids (Chagas, Sleep Sickness).     - These organisms transcribe polycistronic units (like operons) but require monocistronic mRNAs for translation.     - Specialized machinery joins a "Splice Leader" (mini-exon) to the 55' end of each coding region in the polycistronic transcript.     - Forms a "Y" shaped intermediate instead of a lariat.
  • Backsplicing and Circular RNAs:     - Occurs when a downstream 55' splice site joins an upstream 33' splice site.     - Results in a circular RNA molecule.     - Historically viewed as "noise," now known to have regulatory functions.

Alternative Splicing Regulation

  • Alternative Splicing: Generating multiple mature mRNAs from a single primary transcript.
  • Factors:     - SR Proteins: Rich in Serine (SS) and Arginine (RR). Usually bind to Exonic Splicing Enhancers (ESEsESEs) to recruit basal machinery to weak sites.     - hnRNPs: Heterogeneous Nuclear Ribonucleoproteins. Often act as silencers, blocking the machinery via steric hindrance.
  • Sequence Consensus:     - Splice sites are "consensus sequences," not rigid codes.     - Strong sites match the consensus well and bind machinery with high affinity.     - Weak sites differ from consensus and require auxiliary SR proteins for recognition.
  • Kinetic/Processivity Model ("First-come, first-served"):     - RNA Pol II speed and chromatin state affect splicing.     - If Pol II moves fast, a strong downstream site may be transcribed before a weak upstream site is used, leading to exon skipping.     - If Pol II pauses (due to condensed chromatin), the machinery has time to assemble on a weak upstream site before the downstream competitor appears.
  • Structural Secondary Model: RNA folding can hide or expose SR protein binding sites or splice sites.
  • Biological Impact: The DscamDscam gene in Drosophila can produce 38,00038,000 distinct protein isoforms through combinatory alternative splicing.

Cleavage and Polyadenylation

  • Cleavage is distinct from the termination of transcription.
  • The machinery identifies a consensus sequence (often AAUAAAAAUAAA) and a downstream GUGU-rich region.
  • Proteins involved:     - CPSFCPSF (Cleavage and Polyadenylation Specificity Factor): Binds the consensus.     - CstFCstF (Cleavage Stimulation Factor): Binds the GUGU-rich region.     - Cleavage Factors: Endonucleases that cut the RNA between the two sites.     - Poly-A Polymerase (PAPPAP): Adds a tail of approximately 300300 Adenine residues to the new 33' end.
  • Function of the Poly-A Tail:     - Protection from 33' exonucleases.     - Circularization: Poly-A Binding Protein (PABPPABP) interacts with Initiation Factor eIF4G/FeIF4G/F, which binds the Cap Binding Complex (CBCCBC).     - This loops the mRNA, bringing the terminating ribosome close to the start site for efficient re-initiation.

Transcription Termination: The Torpedo Model

  • After the mRNA is cleaved for polyadenylation, the polymerase continues transcribing a waste RNA fragment with a free 55' end.
  • The lack of a cap on this fragment allows the exonuclease XRN2XRN2 to bind.
  • XRN2XRN2 degrades the RNA faster than the polymerase moves, eventually reaching the polymerase and dislodging it from the DNA (the "Torpedo" effect).
  • Non-polyadenylated RNAs (like snRNAs) use the Integrator complex to mediate cleavage and termination.

Nuclear Export and the Exon Junction Complex (EJC)

  • The Exon Junction Complex (EJCEJC) is a protein "mark" left by the spliceosome approximately 202420-24 nucleotides upstream of every joined exon-exon junction.
  • The TREXTREX complex (Transcription-Export complex) recognizes both the Cap Binding Complex (CBCCBC) and the first EJCEJC.
  • This ensures the RNA is exported from the nucleus 55' end first.
  • RNAs without a cap or EJCsEJCs (like some ncRNAs) remain in the nucleus.

Cytoplasmic Quality Control: Nonsense-Mediated Decay (NMD)

  • Pioneer Round of Translation: The first time a ribosome scans an mRNA, it acts like "shelling corn" (desgranando un elote), stripping off EJCsEJCs and other nuclear proteins.
  • If the ribosome encounters a STOP codon but an EJCEJC remains downstream (further toward the 33' end), the cell identifies this as a Premature Termination Codon (PTCPTC).
  • Mechanism:     - The stalling ribosome forms the SURFSURF complex (SMG1SMG1, Upf1Upf1, eRF1eRF1, eRF3eRF3).     - If the SURFSURF complex contacts a downstream EJCEJC, it triggers phosphorylation and the recruitment of SMGSMG proteins.     - SMGSMG proteins are degradative: they decapping enzymes, deadenylases, and endonucleases that destroy the aberrant mRNA.

Competition and Regulation by ncRNAs (ceRNA Hypothesis)

  • microRNAs (miRNAsmiRNAs): Small RNAs that bind to the 3-UTR3'\text{-UTR} of mRNAs via the RISCRISC complex to inhibit translation or trigger cleavage.
  • Competing Endogenous RNA (ceRNAceRNA) / "Musical Chairs" Analogy:     - Various RNAs (Long non-coding RNAs, Circular RNAs, or other mRNAs) compete for a finite pool of specific microRNAs.     - If a highly expressed lncRNAlncRNA acts as a "sponge" and sequester all the miRNAsmiRNAs, the target mRNA is freed from regulation and its protein levels increase.     - Circular RNAs are particularly effective sponges due to their stability (lack of ends prevents exonuclease decay).