Molecular Pathology – Molecular & Cell Biology (Comprehensive Study Notes)
Lecture Outcomes
- By the end of this session you should be able to:
- Describe the structures of DNA and RNA.
- Explain the mechanism and key enzymes of DNA replication.
- Identify the basic structural elements of a gene.
- Articulate the “central dogma” (DNA → RNA → Protein) including transcription and translation mechanics.
- Outline higher-order DNA packaging and overall chromosome anatomy.
Nucleic Acids: DNA vs RNA
- Found in all living cells and many viruses; serve as the cell’s information storage & retrieval system.
- Both are polymers of nucleotides, but differ in:
- Sugar: DNA uses deoxyribose, RNA uses ribose (has a 2'\text{–OH} group).
- Strand state: DNA is typically double-stranded; RNA is usually single-stranded but can fold into complex secondary/tertiary structures.
- Bases: Thymine (T) is exclusive to DNA; Uracil (U) replaces T in RNA.
- Functional overview
- DNA: long-term, heritable information.
- RNA: diverse roles—coding (mRNA), adaptor (tRNA), catalytic/structural (rRNA), regulatory (miRNA, lncRNA, etc.).
Nucleotides, Nucleosides & Nitrogenous Bases
- Definitions
- BASE + SUGAR ⇒ Nucleoside.
- BASE + SUGAR + PHOSPHATE ⇒ Nucleotide.
- Nitrogenous bases
- Purines (double-ring): Adenine (A), Guanine (G).
- Pyrimidines (single-ring): Cytosine (C), Thymine (T—DNA only), Uracil (U—RNA only).
- Coding capacity derives from the specific order (primary structure) of these bases.
Watson–Crick Base-Pairing Rules
- Purine always pairs with a pyrimidine.
- \text{A} \leftrightarrow \text{T} (or \text{U} in RNA) via 2 H-bonds.
- \text{C} \leftrightarrow \text{G} via 3 H-bonds (stronger interaction).
- Complementarity underpins replication, transcription, PCR, and many diagnostic assays.
DNA Structure
- Primary structure: linear nucleotide sequence written 5' \rightarrow 3'.
- Secondary structure: antiparallel double helix (two strands run 5' \rightarrow 3' in opposite directions).
- One complete helical turn contains 10\text{ bp} and spans 3.4\,\text{nm}; helix diameter ≈ 2.37\,\text{nm}.
- Higher-order packing (see Chromosomal Packaging section) allows >2\,\text{m} of DNA to fit inside a nucleus.
RNA Structure
- Primary: single strand listed 5' \rightarrow 3'.
- Secondary: intramolecular base-pairing (Watson–Crick) creates hairpins, loops, and more.
- Example: tRNA adopts a cloverleaf secondary structure (detailed later in course).
- Key chemical difference (2′OH) enables catalytic activity (ribozymes) and affects stability (RNA less stable than DNA).
Functional Classes of RNA
- mRNA (messenger): carries coding information; template for protein synthesis.
- tRNA (transfer): at least one per amino acid; delivers amino acids to ribosome.
- rRNA (ribosomal): structural & catalytic core of ribosomes.
- ncRNAs (non-coding): huge family including snRNA, snoRNA, hnRNA, miRNA, lncRNA, piRNA, etc.—critical for regulation, RNA processing, chromatin state, and genome defense.
DNA Replication
- Timing: S phase of the cell cycle.
- Overall: semiconservative—each daughter helix contains one parental and one newly synthesized strand.
- Core protein machinery
- Helicase: unwinds double helix.
- Single-stranded DNA-binding proteins (SSB): prevent re-annealing.
- Primase: synthesizes short RNA primers to provide free 3'\text{–OH}.
- DNA polymerase III (main prokaryotic enzyme; analogous Pol δ/ε in eukaryotes): elongates DNA 5' \rightarrow 3', proof-reads (3′→5′ exonuclease activity).
- DNA polymerase I (prokaryotes) / RNase H & Pol δ (eukaryotes): primer removal & gap filling.
- Ligase: seals nicks, joins Okazaki fragments on lagging strand.
- Replication bubble/forks
- Each origin fires bi-directionally, generating two forks.
- Leading strand: synthesized continuously toward fork.
- Lagging strand: synthesized discontinuously, forming \sim100–200 nt (eukaryotes) Okazaki fragments away from fork.
- Chemistry of elongation
- Incoming dNTP’s \alpha-phosphate forms phosphodiester bond with 3'\text{–OH} of growing strand.
- Release & hydrolysis of pyrophosphate (PP(_i)) drives reaction energetically.
- Multiple bubbles replicate eukaryotic chromosomes simultaneously to accelerate S-phase.
Laboratory Relevance
- Fundamental to PCR amplification, DNA sequencing (Sanger, pyrosequencing, semiconductor), cloning, site-directed mutagenesis.
- Standard notation of strands and primers uses 5' \rightarrow 3' orientation.
Transcription (DNA → pre-mRNA)
- Enzymes & factors
- RNA polymerase II (for mRNA), general transcription factors (TFIIA, TFIIB, etc.), template DNA, NTPs.
- Steps
- Initiation: polymerase + TFs assemble at promoter (TATA or TATA-less); double helix locally unwound forming a transcription bubble.
- Elongation: RNA pol moves along template (antisense) strand synthesizing RNA 5' \rightarrow 3' with base complementarity (A ↔ U, C ↔ G).
- Termination: polymerase disengages at termination signals; RNA released.
- Co-/post-transcriptional processing:
- 5' capping, 3' poly-A tailing (\text{AA…A}_{\sim200}), splicing, RNA editing or chemical modifications.
Splicing Mechanics
- pre-mRNA exported to spliceosome; introns removed, exons ligated.
- Critical cis-elements
- Splice donor (GU) & acceptor (AG) dinucleotides at intron boundaries.
- Branch point adenine, polypyrimidine tract.
- Exonic/intronic splicing enhancers or silencers bound by SR proteins or hnRNPs; tissue-specific combinations generate diversity.
- SR-protein expression profiles vary across tissues → regulates exon inclusion/skipping.
Alternative Splicing
- Modes: exon skipping, intron retention, alternative 5' donor or 3' acceptor sites, mutually exclusive exons, alternative promoters/poly-A sites.
- Outcome: multiple protein isoforms from one gene; expands proteome (>10 isoforms for some genes).
- Clinical/lab relevance: RT-PCR to study splice defects, expression profiling, biomarker discovery.
Translation (mRNA → Polypeptide)
- Process occurs on ribosomes in cytoplasm.
- Steps
- Initiation: small ribosomal subunit binds mRNA cap or Shine-Dalgarno (prokaryotes), scans to AUG; Met-tRNA (initiator) + initiation factors assemble large subunit.
- Elongation: aminoacyl-tRNAs enter A-site, peptide bond catalyzed in P-site, ribosome translocates; governed by EF-Tu/EF-G (prok.) or eEF1/eEF2 (euk.).
- Termination: stop codon (UAA, UAG, UGA) recruits release factors; ribosome disassembles, polypeptide released.
- Polysomes: multiple ribosomes concurrently translate a single mRNA, boosting output.
- Aminoacyl-tRNA synthetases “charge” tRNAs; ATP-dependent, high fidelity.
Central Dogma Integration & Regulation
- Information flow: DNA \xrightarrow{\text{transcription}} RNA \xrightarrow{\text{translation}} Protein.
- Mutations can lead to
- No effect (silent/redundant systems).
- Loss-of-function → disease or apoptosis.
- Gain-of-function or dominant-negative effects.
- Phenotypic outcome depends on protein role, redundancy, expression context.
Gene as a Collection of Binding Sites
- Gene expression controlled by binding of RNAs/proteins at promoters, enhancers, silencers, splice sites, UTR motifs, etc.
General Structure of a Protein-Coding Gene
- Promoter (RNA pol/TF binding) – TATA or TATA-less.
- 5' Untranslated Region (UTR) – influences translation initiation.
- Start codon \text{ATG}.
- Exons & introns (coding/non-coding sequences).
- Splice donor/acceptor sites (usual \text{GT/AG}; rare \text{AT/AC}).
- Splice enhancers/silencers (exonic/intronic).
- Stop codon \text{TAA, TAG, TGA}.
- 3' UTR – mRNA stability, localisation, miRNA binding.
- Polyadenylation signal (AAUAAA or variants).
Genome Organization
Nuclear vs Mitochondrial Genomes
- Nuclear: \sim3\times10^9\,\text{bp}, >20000 genes, 23 chromosomal pairs.
- Mitochondrial: 16569 bp (~0.001 %), 37 genes (13 proteins for ETC, 24 ncRNAs), closed circular, intron-less, thousands of copies/cell.
Chromosomal DNA Packaging
- Hierarchy
- DNA double helix (~2\,\text{nm}) wraps around histone octamers forming nucleosomes "beads on a string" (~11\,\text{nm} fiber).
- Nucleosomes coil → 30\,\text{nm} chromatin fiber.
- Further looping/condensation → 300–700\,\text{nm} domains.
- Fully condensed metaphase chromosome ~1400\,\text{nm} width.
- Histones are nuclear-encoded; chromatin must dynamically unfold during replication/transcription.
Human Karyotype
- 46 chromosomes: 22 autosome pairs + XX or XY sex chromosomes.
- Chromosome sizes & banding patterns used diagnostically.
Human Genome Project (1990–2001)
- Public (IHGSC) & private (Celera) efforts produced the draft sequence.
- Composite reference—mosaic of multiple individuals.
- Ongoing questions
- Completeness, functional annotation, inter-individual variation.
- Bioinformatics tools (e.g., BLAST) essential for sequence comparison & annotation.
Content Breakdown (≈3200 Mb)
- Genes (exons): ≈48 Mb (≈1.5 % of genome).
- Introns + UTRs: ≈1.2 Gb.
- Interspersed repeats: ≈1.4 Gb.
- Other intergenic: ≈600 Mb.
- LINEs: ≈640 Mb.
- SINEs (incl. ALUs): ≈420 Mb.
- LTR elements: ≈250 Mb.
- DNA transposons: ≈90 Mb.
- Microsatellites: ≈90 Mb.
Transposable Elements
Retrotransposons
- Share ancestry with retroviruses (gag, pol, env, LTRs).
- Encode reverse transcriptase—copy-and-paste via RNA intermediate but cannot exit cell.
LINEs (Long INterspersed Elements)
- 6–8 kb, autonomous (encode own RT & endonuclease).
- Families LINE-1 (active), LINE-2, LINE-3.
SINEs (Short INterspersed Elements)
- <500 bp, non-autonomous; hijack LINE machinery.
- ALU: primate-specific SINE (~1 per 3 kb; >80 M years old); useful molecular clock.
Biological Impact
- ~50 % of human genome is transposon-derived.
- Can cause insertional mutagenesis, unequal crossover, but most are inactive today.
Microsatellites (Simple Sequence Repeats)
- 1–15 bp motifs repeated 2–50× in tandem (dinucleotides most common).
- Polymerase slippage → high mutation rate (expansion/contraction).
- Can reside in coding regions but usually avoided.
- Useful for genetic mapping, forensics (e.g., CODIS system).
Non-Coding RNA (ncRNA) Genes
- Functional RNAs transcribed but not translated.
- Major classes & approximate copy numbers (Esteller 2011):
- miRNA (≈1424): 19–24 nt, post-transcriptional gene silencing.
- piRNA (≈23439): 26–31 nt, repress transposons, direct DNA methylation.
- tiRNA (>5000): 17–18 nt, near transcription start sites, regulatory?
- snoRNA (>300): 60–300 nt, rRNA base modification.
- snRNA: splicing machinery components.
- lncRNA (>4000 subclasses): >200 nt, chromatin remodeling, X-inactivation, imprinting, mRNA stability.
Pseudogenes
- Defunct relatives of genes; arise by duplication or retrotransposition.
- Types
- Gene fragments (single/multi-exon).
- Unprocessed (whole gene incl. introns; often mutated splice sites).
- Processed pseudogenes (cDNA copies re-inserted; lack introns, often flanked by direct repeats).
- Provide raw material for evolution; can regulate cognate genes via competing RNA mechanisms.
Protein-Coding Genes
- Although only ≈1.5 % of genome, heavily studied.
- Copy number
- Single-copy (e.g., β-globin).
- Multicopy clusters (e.g., HLA class I).
- Evolution
- Gene families via duplication/divergence.
- Superfamilies share conserved domains (e.g., Immunoglobulin-superfamily).
- Identification in silico by aligning mRNAs/cDNAs to genomic DNA (EST projects, GenBank).
- Standardized nomenclature crucial for data sharing.
Overlap & Orientation
- Genes frequently overlap, reside on opposite strands, or embed within introns of larger genes; pseudogenes intermingle with functional loci—complicates annotation.
“Average” Gene Statistics (IHGSC 2001)
- Size range: 2 kb → 2 Mb (huge variability).
- Protein length: broad distribution.
- UTR lengths: 3' UTRs generally longer than 5'.
- Alternate first exons common—promoter diversity.
Alternative Splicing & Proteome Size
- Only 20–25 k genes yet estimated 50–100 k proteins; explanatory factor: >60 % of multi-exon genes undergo alternative splicing.
Example: Cell-Surface Receptor Architecture
- Typical domain layout
- N-terminal leader (signal peptide) ~20 aa.
- Variable extracellular domains (e.g., Ig-like).
- Stalk/linker.
- Transmembrane helix ± membrane anchor.
- Intracellular tail for signaling (ITAM, SH2/SH3-binding sites, etc.).
- NKp44 receptor (6p21.1) follows such modular blueprint.
Laboratory & Diagnostic Connections
- PCR, RT-PCR, qPCR: leverage replication & transcription principles.
- Sequencing technologies employ chain-terminating nucleotides (Sanger) or nucleotide incorporation chemistry (pyrosequencing, semiconductor).
- Splicing assays detect aberrant exon usage in genetic disease.
- Genome browsers, BLAST, and annotation databases essential for variant interpretation.
- Chromatin immunoprecipitation, DNase-seq, ATAC-seq probe packaging and regulatory landscapes.
Ethical & Practical Considerations
- Genomic data raises privacy issues; variant interpretation impacts clinical decision-making.
- Understanding of genome architecture informs gene therapy vector design & off-target risk assessment.
- Transposon remnants present challenges for genome editing (CRISPR mismatch binding) but also offer tools (e.g., sleeping beauty transposase systems).