Comprehensive Notes on Gene Expression: Transcription, Translation, and Regulation
Fundamentals of Gene Expression and Protein Function
The Central Dogma of Molecular Biology: The flow of genetic information follows the pathway: .
Definition of Gene Expression: This term encompasses the processes of transcription, translation, and the complex regulation thereof that determine how genes manifest as physical and functional phenotypes.
The Role of Proteins: Proteins are the primary functional units of the cell, responsible for nearly all cellular work, biochemistry, physiology, and behavior. Their roles include:
Enzymes: Building and regulating other macromolecules such as carbohydrates, lipids, hormones, and vitamins.
Transport: Facilitating the movement of substances across membranes.
Structure: Providing shape and mechanical support.
Mechanical Work: Driving contraction and locomotion.
Communication: Managing cellular signaling and the immune system.
The Genome as an Instruction Book: The genome functions as a manual for:
Building proteins.
Regulating the specific functions of those proteins.
Regulating the timing (temporal expression) of protein production.
Genomic Composition:
Protein-Coding Genes: There are approximately protein-coding genes, which represent only of the total genome.
RNA Genes: There are between and RNA genes (representing even less of the genome percentage due to their smaller size).
Regulatory DNA: In eukaryotes, elaborate control of gene expression requires between to times more DNA than the actual genes themselves.
Layers of Gene Expression Regulation
Transcriptional Regulation: Controls the initiation and rate of RNA synthesis.
Regulatory Elements: DNA sequences such as promoters, enhancers, silencers, and insulators.
Regulatory Proteins: Transcription factors and co-activators.
Epigenetics: Chemical modifications of DNA and histones (chromatin remodeling) that affect DNA accessibility.
Post-transcriptional Regulation: Controls the processing and stability of the RNA transcript.
Includes RNA processing, splicing, RNA export from the nucleus, RNA stability, and the speed of translation.
Post-translational Modification: Controls the activity and lifespan of the final protein.
Includes modifications after synthesis, subcellular localization, and protein stability.
General Structure of a Gene
Element: A specific DNA sequence to which a protein binds for gene regulation.
Promoter: The upstream region where transcription initiation begins.
Core Promoter: Contains elements for binding general transcription factors necessary for all genes.
Proximal Elements: Specific sites for sequence-specific transcription factors (TFs). A single promoter may have dozens or hundreds of TF binding sites.
Transcription Start Site (TSS): The point where base pair (bp) numbering begins within a gene.
-UTR (Untranslated Region): The region between the TSS and the start codon ().
Reading Frames:
Prokaryotes: Often have a continuous Open Reading Frame (ORF). mRNAs can be polycistronic, meaning one mRNA contains several ORFs coding for different proteins.
Eukaryotes: Consist of alternating exons (coding regions) and introns (non-coding regions). Introns are removed post-transcriptionally to form a continuous ORF.
-UTR: The region located after the stop codon.
Terminator: The signal sequence that triggers the end of transcription.
Transcription Mechanisms
Strand Directionality: RNA is always transcribed in the direction, similar to DNA synthesis.
Template Strand: The DNA strand that RNA polymerase physically moves upon.
Coding Strand: The DNA strand that the polymerase does NOT move on; its sequence matches the RNA transcript (substituting for ).
Initiation:
RNA polymerase does not bind DNA directly; it must be recruited.
Prokaryotes: The Sigma () factor binds to the Pribnow box (the element) and the element. The Sigma factor detaches once transcription begins. Some genes also use an "operator" site for repressor binding.
Eukaryotes: The TATA-binding protein (TBP), part of the TFIID complex, binds to the TATA box. TFIID recruits the basal transcription machinery. Enhancers located hundreds of bases upstream are often required for TF binding and initiation. Local chromatin status serves as an overriding regulatory factor.
Elongation: Generally proceeds automatically once initiated.
Termination:
Prokaryotes: Signaled by a "hair-pin" structure formed by self-base-pairing of the RNA transcript.
Eukaryotes: Signaled by a "poly-A" signal sequence in most genes.
Post-Transcriptional Processing
Prokaryotes: There is no nucleus to separate transcription from translation. Translation begins immediately, often before transcription is even finished. Multiple ribosomes called polysomes can translate a single mRNA simultaneously.
Eukaryotes: The "primary transcript" (pre-mRNA) requires extensive processing in the nucleus before export to the cytoplasm.
Cap (7-methyl Guanosine): A guanosine is attached backwards to the end via a triphosphate bridge and then methylated. This protects the transcript from exonucleases in the cytosol and is recognized by ribosomal recruitment machinery.
Splicing: Conducted by the spliceosome (composed of protein and five small nuclear RNAs or snRNAs). It removes introns and joins exons.
Donor Site: Usually at the end of the intron.
Acceptor Site: Usually at the end of the intron.
Alternative Splicing: Allows specific exons to be included or excluded, enabling a single gene to produce multiple distinct proteins.
Poly-A Tail: Added after transcription termination. The end is trimmed, and a tail is added (triggered by the poly-A signal in the -UTR). It protects the end and signals that the mRNA is ready for nuclear export.
The Machinery of Translation
Ribosomes: Composed of two subunits made of rRNA and proteins, assembled in the nucleolus.
Small Subunit: Handles mRNA and loads triplets into place.
Large Subunit: Loads tRNA and catalyzes peptide bond formation.
rRNA Synthesis: Transcribed by RNA Polymerase I in dense nucleolar regions (rDNA arrays). The large RNAs are chopped into mature rRNAs. Ribosomal proteins are made in the cytoplasm and imported to the nucleus for assembly.
Transfer RNA (tRNA): Acts as the physical link between the mRNA code and amino acids.
Contains an anticodon that is complementary to the mRNA codon via hydrogen bonding.
Carries a specific amino acid attached to the attachment site.
The Process of Translation
Initiation:
The small subunit recognizes the end and slides along to find the start codon.
Prokaryotic Start: The first after the Shine-Dalgarno box ().
Eukaryotic Start: Usually the first after the cap within the Kozak sequence ().
The first tRNA () is pre-loaded on the small subunit. Recognition of recruits the large subunit.
Elongation:
P (Peptidyl) Site: Holds the tRNA with the growing polypeptide chain. Initially holds the .
A (Acceptor) Site: The entry site for the next amino acid-carrying tRNA.
Peptide Bond Formation: The large subunit catalyzes the bond. The amino acid is transferred from the P site to the A site.
Translocation: The ribosome slides nucleotides toward the end. The empty tRNA moves to the E (Exit) Site and is released.
Termination: Occurs when a stop codon reaches the A site. No tRNA recognizes stop codons; instead, a release factor enters the A site, triggering disassembly and the release of the polypeptide.
Post-Translational Modifications and Genetic Code
Modifications (Eukaryotes):
Proteolysis (Cleavage): Chopping inactive polyproteins into separate, active proteins.
Phosphorylation: Addition of phosphates by kinases (removed by phosphatases) on Serine, Threonine, or Tyrosine to act as on/off switches.
Glycosylation: Addition of sugars.
Lipidation: Addition of lipids, often for membrane anchoring.
Ubiquitination: Targeting a protein for destruction.
Properties of the Genetic Code:
Triplet Nature: Every nucleotides form a codon. Read .
Reading Frame: Established by the initial . Maintained through splicing without punctuation. Insertions or deletions (indels) are more damaging than point mutations because they shift this frame.
Redundancy and Wobble: There are codons for amino acids. "Wobble" occurs when the third nucleotide in a codon varies without changing the amino acid.
Universality: The genetic code is nearly universal across all life forms.
Methionine (): A rare amino acid. The first is frequently removed from the finished protein.
Review Questions and Exercises
Practice Exercise: Given the DNA template strand .
Determine the coding strand (in orientation).
Determine the mRNA transcript (in orientation).