Comprehensive Notes on Gene Expression: Transcription, Translation, and Regulation

Fundamentals of Gene Expression and Protein Function

  • The Central Dogma of Molecular Biology: The flow of genetic information follows the pathway: DNARNAProteinDNA \rightarrow RNA \rightarrow Protein.

  • Definition of Gene Expression: This term encompasses the processes of transcription, translation, and the complex regulation thereof that determine how genes manifest as physical and functional phenotypes.

  • The Role of Proteins: Proteins are the primary functional units of the cell, responsible for nearly all cellular work, biochemistry, physiology, and behavior. Their roles include:

    • Enzymes: Building and regulating other macromolecules such as carbohydrates, lipids, hormones, and vitamins.

    • Transport: Facilitating the movement of substances across membranes.

    • Structure: Providing shape and mechanical support.

    • Mechanical Work: Driving contraction and locomotion.

    • Communication: Managing cellular signaling and the immune system.

  • The Genome as an Instruction Book: The genome functions as a manual for:

    • Building proteins.

    • Regulating the specific functions of those proteins.

    • Regulating the timing (temporal expression) of protein production.

  • Genomic Composition:

    • Protein-Coding Genes: There are approximately 20,00020,000 protein-coding genes, which represent only 1.5%1.5\% of the total genome.

    • RNA Genes: There are between 40,00040,000 and 50,00050,000 RNA genes (representing even less of the genome percentage due to their smaller size).

    • Regulatory DNA: In eukaryotes, elaborate control of gene expression requires between 1010 to 2020 times more DNA than the actual genes themselves.

Layers of Gene Expression Regulation

  • Transcriptional Regulation: Controls the initiation and rate of RNA synthesis.

    • Regulatory Elements: DNA sequences such as promoters, enhancers, silencers, and insulators.

    • Regulatory Proteins: Transcription factors and co-activators.

    • Epigenetics: Chemical modifications of DNA and histones (chromatin remodeling) that affect DNA accessibility.

  • Post-transcriptional Regulation: Controls the processing and stability of the RNA transcript.

    • Includes RNA processing, splicing, RNA export from the nucleus, RNA stability, and the speed of translation.

  • Post-translational Modification: Controls the activity and lifespan of the final protein.

    • Includes modifications after synthesis, subcellular localization, and protein stability.

General Structure of a Gene

  • Element: A specific DNA sequence to which a protein binds for gene regulation.

  • Promoter: The upstream region where transcription initiation begins.

    • Core Promoter: Contains elements for binding general transcription factors necessary for all genes.

    • Proximal Elements: Specific sites for sequence-specific transcription factors (TFs). A single promoter may have dozens or hundreds of TF binding sites.

  • Transcription Start Site (TSS): The point where base pair (bp) numbering begins within a gene.

  • 55'-UTR (Untranslated Region): The region between the TSS and the start codon (AUGAUG).

  • Reading Frames:

    • Prokaryotes: Often have a continuous Open Reading Frame (ORF). mRNAs can be polycistronic, meaning one mRNA contains several ORFs coding for different proteins.

    • Eukaryotes: Consist of alternating exons (coding regions) and introns (non-coding regions). Introns are removed post-transcriptionally to form a continuous ORF.

  • 33'-UTR: The region located after the stop codon.

  • Terminator: The signal sequence that triggers the end of transcription.

Transcription Mechanisms

  • Strand Directionality: RNA is always transcribed in the 535' \rightarrow 3' direction, similar to DNA synthesis.

  • Template Strand: The DNA strand that RNA polymerase physically moves upon.

  • Coding Strand: The DNA strand that the polymerase does NOT move on; its sequence matches the RNA transcript (substituting TT for UU).

  • Initiation:

    • RNA polymerase does not bind DNA directly; it must be recruited.

    • Prokaryotes: The Sigma (σ\sigma) factor binds to the Pribnow box (the 10-10 element) and the 35-35 element. The Sigma factor detaches once transcription begins. Some genes also use an "operator" site for repressor binding.

    • Eukaryotes: The TATA-binding protein (TBP), part of the TFIID complex, binds to the TATA box. TFIID recruits the basal transcription machinery. Enhancers located hundreds of bases upstream are often required for TF binding and initiation. Local chromatin status serves as an overriding regulatory factor.

  • Elongation: Generally proceeds automatically once initiated.

  • Termination:

    • Prokaryotes: Signaled by a "hair-pin" structure formed by self-base-pairing of the RNA transcript.

    • Eukaryotes: Signaled by a "poly-A" signal sequence in most genes.

Post-Transcriptional Processing

  • Prokaryotes: There is no nucleus to separate transcription from translation. Translation begins immediately, often before transcription is even finished. Multiple ribosomes called polysomes can translate a single mRNA simultaneously.

  • Eukaryotes: The "primary transcript" (pre-mRNA) requires extensive processing in the nucleus before export to the cytoplasm.

    • 55' Cap (7-methyl Guanosine): A guanosine is attached backwards to the 55' end via a triphosphate bridge and then methylated. This protects the transcript from exonucleases in the cytosol and is recognized by ribosomal recruitment machinery.

    • Splicing: Conducted by the spliceosome (composed of protein and five small nuclear RNAs or snRNAs). It removes introns and joins exons.

      • Donor Site: Usually GUGU at the 55' end of the intron.

      • Acceptor Site: Usually AGAG at the 33' end of the intron.

      • Alternative Splicing: Allows specific exons to be included or excluded, enabling a single gene to produce multiple distinct proteins.

    • Poly-A Tail: Added after transcription termination. The 33' end is trimmed, and a tail is added (triggered by the poly-A signal AAUAAAAAUAAA in the 33'-UTR). It protects the 33' end and signals that the mRNA is ready for nuclear export.

The Machinery of Translation

  • Ribosomes: Composed of two subunits made of rRNA and proteins, assembled in the nucleolus.

    • Small Subunit: Handles mRNA and loads triplets into place.

    • Large Subunit: Loads tRNA and catalyzes peptide bond formation.

    • rRNA Synthesis: Transcribed by RNA Polymerase I in dense nucleolar regions (rDNA arrays). The large RNAs are chopped into mature rRNAs. Ribosomal proteins are made in the cytoplasm and imported to the nucleus for assembly.

  • Transfer RNA (tRNA): Acts as the physical link between the mRNA code and amino acids.

    • Contains an anticodon that is complementary to the mRNA codon via hydrogen bonding.

    • Carries a specific amino acid attached to the attachment site.

The Process of Translation

  • Initiation:

    • The small subunit recognizes the 55' end and slides along to find the start codon.

    • Prokaryotic Start: The first AUGAUG after the Shine-Dalgarno box (AGGAGG\sim AGGAGG).

    • Eukaryotic Start: Usually the first AUGAUG after the 55' cap within the Kozak sequence (ACCAUGGACCAUGG).

    • The first tRNA (MettRNAMet-tRNA) is pre-loaded on the small subunit. Recognition of AUGAUG recruits the large subunit.

  • Elongation:

    • P (Peptidyl) Site: Holds the tRNA with the growing polypeptide chain. Initially holds the MettRNAMet-tRNA.

    • A (Acceptor) Site: The entry site for the next amino acid-carrying tRNA.

    • Peptide Bond Formation: The large subunit catalyzes the bond. The amino acid is transferred from the P site to the A site.

    • Translocation: The ribosome slides 33 nucleotides toward the 33' end. The empty tRNA moves to the E (Exit) Site and is released.

  • Termination: Occurs when a stop codon reaches the A site. No tRNA recognizes stop codons; instead, a release factor enters the A site, triggering disassembly and the release of the polypeptide.

Post-Translational Modifications and Genetic Code

  • Modifications (Eukaryotes):

    • Proteolysis (Cleavage): Chopping inactive polyproteins into separate, active proteins.

    • Phosphorylation: Addition of phosphates by kinases (removed by phosphatases) on Serine, Threonine, or Tyrosine to act as on/off switches.

    • Glycosylation: Addition of sugars.

    • Lipidation: Addition of lipids, often for membrane anchoring.

    • Ubiquitination: Targeting a protein for destruction.

  • Properties of the Genetic Code:

    • Triplet Nature: Every 33 nucleotides form a codon. Read 535' \rightarrow 3'.

    • Reading Frame: Established by the initial AUGAUG. Maintained through splicing without punctuation. Insertions or deletions (indels) are more damaging than point mutations because they shift this frame.

    • Redundancy and Wobble: There are 6161 codons for 2020 amino acids. "Wobble" occurs when the third nucleotide in a codon varies without changing the amino acid.

    • Universality: The genetic code is nearly universal across all life forms.

    • Methionine (MetMet): A rare amino acid. The first MetMet is frequently removed from the finished protein.

Review Questions and Exercises

  • Practice Exercise: Given the DNA template strand 5 TTA CTT AGC TGG CTA 35'\text{ TTA CTT AGC TGG CTA }3'.

    • Determine the coding strand (in 535' \rightarrow 3' orientation).

    • Determine the mRNA transcript (in 535' \rightarrow 3' orientation).