Molecular Pathology – Molecular & Cell Biology (Comprehensive Study Notes)

Lecture Outcomes

  • By the end of this session you should be able to:
    • Describe the structures of DNA and RNA.
    • Explain the mechanism and key enzymes of DNA replication.
    • Identify the basic structural elements of a gene.
    • Articulate the “central dogma” (DNA → RNA → Protein) including transcription and translation mechanics.
    • Outline higher-order DNA packaging and overall chromosome anatomy.

Nucleic Acids: DNA vs RNA

  • Found in all living cells and many viruses; serve as the cell’s information storage & retrieval system.
  • Both are polymers of nucleotides, but differ in:
    • Sugar: DNA uses deoxyribose, RNA uses ribose (has a 2'\text{–OH} group).
    • Strand state: DNA is typically double-stranded; RNA is usually single-stranded but can fold into complex secondary/tertiary structures.
    • Bases: Thymine (T) is exclusive to DNA; Uracil (U) replaces T in RNA.
  • Functional overview
    • DNA: long-term, heritable information.
    • RNA: diverse roles—coding (mRNA), adaptor (tRNA), catalytic/structural (rRNA), regulatory (miRNA, lncRNA, etc.).

Nucleotides, Nucleosides & Nitrogenous Bases

  • Definitions
    • BASE + SUGAR ⇒ Nucleoside.
    • BASE + SUGAR + PHOSPHATE ⇒ Nucleotide.
  • Nitrogenous bases
    • Purines (double-ring): Adenine (A), Guanine (G).
    • Pyrimidines (single-ring): Cytosine (C), Thymine (T—DNA only), Uracil (U—RNA only).
  • Coding capacity derives from the specific order (primary structure) of these bases.

Watson–Crick Base-Pairing Rules

  • Purine always pairs with a pyrimidine.
    • \text{A} \leftrightarrow \text{T} (or \text{U} in RNA) via 2 H-bonds.
    • \text{C} \leftrightarrow \text{G} via 3 H-bonds (stronger interaction).
  • Complementarity underpins replication, transcription, PCR, and many diagnostic assays.

DNA Structure

  • Primary structure: linear nucleotide sequence written 5' \rightarrow 3'.
  • Secondary structure: antiparallel double helix (two strands run 5' \rightarrow 3' in opposite directions).
    • One complete helical turn contains 10\text{ bp} and spans 3.4\,\text{nm}; helix diameter ≈ 2.37\,\text{nm}.
  • Higher-order packing (see Chromosomal Packaging section) allows >2\,\text{m} of DNA to fit inside a nucleus.

RNA Structure

  • Primary: single strand listed 5' \rightarrow 3'.
  • Secondary: intramolecular base-pairing (Watson–Crick) creates hairpins, loops, and more.
    • Example: tRNA adopts a cloverleaf secondary structure (detailed later in course).
  • Key chemical difference (2′OH) enables catalytic activity (ribozymes) and affects stability (RNA less stable than DNA).

Functional Classes of RNA

  1. mRNA (messenger): carries coding information; template for protein synthesis.
  2. tRNA (transfer): at least one per amino acid; delivers amino acids to ribosome.
  3. rRNA (ribosomal): structural & catalytic core of ribosomes.
  4. ncRNAs (non-coding): huge family including snRNA, snoRNA, hnRNA, miRNA, lncRNA, piRNA, etc.—critical for regulation, RNA processing, chromatin state, and genome defense.

DNA Replication

  • Timing: S phase of the cell cycle.
  • Overall: semiconservative—each daughter helix contains one parental and one newly synthesized strand.
  • Core protein machinery
    • Helicase: unwinds double helix.
    • Single-stranded DNA-binding proteins (SSB): prevent re-annealing.
    • Primase: synthesizes short RNA primers to provide free 3'\text{–OH}.
    • DNA polymerase III (main prokaryotic enzyme; analogous Pol δ/ε in eukaryotes): elongates DNA 5' \rightarrow 3', proof-reads (3′→5′ exonuclease activity).
    • DNA polymerase I (prokaryotes) / RNase H & Pol δ (eukaryotes): primer removal & gap filling.
    • Ligase: seals nicks, joins Okazaki fragments on lagging strand.
  • Replication bubble/forks
    • Each origin fires bi-directionally, generating two forks.
    • Leading strand: synthesized continuously toward fork.
    • Lagging strand: synthesized discontinuously, forming \sim100–200 nt (eukaryotes) Okazaki fragments away from fork.
  • Chemistry of elongation
    • Incoming dNTP’s \alpha-phosphate forms phosphodiester bond with 3'\text{–OH} of growing strand.
    • Release & hydrolysis of pyrophosphate (PP(_i)) drives reaction energetically.
  • Multiple bubbles replicate eukaryotic chromosomes simultaneously to accelerate S-phase.

Laboratory Relevance

  • Fundamental to PCR amplification, DNA sequencing (Sanger, pyrosequencing, semiconductor), cloning, site-directed mutagenesis.
  • Standard notation of strands and primers uses 5' \rightarrow 3' orientation.

Transcription (DNA → pre-mRNA)

  • Enzymes & factors
    • RNA polymerase II (for mRNA), general transcription factors (TFIIA, TFIIB, etc.), template DNA, NTPs.
  • Steps
    1. Initiation: polymerase + TFs assemble at promoter (TATA or TATA-less); double helix locally unwound forming a transcription bubble.
    2. Elongation: RNA pol moves along template (antisense) strand synthesizing RNA 5' \rightarrow 3' with base complementarity (A ↔ U, C ↔ G).
    3. Termination: polymerase disengages at termination signals; RNA released.
    4. Co-/post-transcriptional processing:
    • 5' capping, 3' poly-A tailing (\text{AA…A}_{\sim200}), splicing, RNA editing or chemical modifications.

Splicing Mechanics

  • pre-mRNA exported to spliceosome; introns removed, exons ligated.
  • Critical cis-elements
    • Splice donor (GU) & acceptor (AG) dinucleotides at intron boundaries.
    • Branch point adenine, polypyrimidine tract.
    • Exonic/intronic splicing enhancers or silencers bound by SR proteins or hnRNPs; tissue-specific combinations generate diversity.
  • SR-protein expression profiles vary across tissues → regulates exon inclusion/skipping.

Alternative Splicing

  • Modes: exon skipping, intron retention, alternative 5' donor or 3' acceptor sites, mutually exclusive exons, alternative promoters/poly-A sites.
  • Outcome: multiple protein isoforms from one gene; expands proteome (>10 isoforms for some genes).
  • Clinical/lab relevance: RT-PCR to study splice defects, expression profiling, biomarker discovery.

Translation (mRNA → Polypeptide)

  • Process occurs on ribosomes in cytoplasm.
  • Steps
    1. Initiation: small ribosomal subunit binds mRNA cap or Shine-Dalgarno (prokaryotes), scans to AUG; Met-tRNA (initiator) + initiation factors assemble large subunit.
    2. Elongation: aminoacyl-tRNAs enter A-site, peptide bond catalyzed in P-site, ribosome translocates; governed by EF-Tu/EF-G (prok.) or eEF1/eEF2 (euk.).
    3. Termination: stop codon (UAA, UAG, UGA) recruits release factors; ribosome disassembles, polypeptide released.
  • Polysomes: multiple ribosomes concurrently translate a single mRNA, boosting output.
  • Aminoacyl-tRNA synthetases “charge” tRNAs; ATP-dependent, high fidelity.

Central Dogma Integration & Regulation

  • Information flow: DNA \xrightarrow{\text{transcription}} RNA \xrightarrow{\text{translation}} Protein.
  • Mutations can lead to
    • No effect (silent/redundant systems).
    • Loss-of-function → disease or apoptosis.
    • Gain-of-function or dominant-negative effects.
  • Phenotypic outcome depends on protein role, redundancy, expression context.

Gene as a Collection of Binding Sites

  • Gene expression controlled by binding of RNAs/proteins at promoters, enhancers, silencers, splice sites, UTR motifs, etc.

General Structure of a Protein-Coding Gene

  1. Promoter (RNA pol/TF binding) – TATA or TATA-less.
  2. 5' Untranslated Region (UTR) – influences translation initiation.
  3. Start codon \text{ATG}.
  4. Exons & introns (coding/non-coding sequences).
  5. Splice donor/acceptor sites (usual \text{GT/AG}; rare \text{AT/AC}).
  6. Splice enhancers/silencers (exonic/intronic).
  7. Stop codon \text{TAA, TAG, TGA}.
  8. 3' UTR – mRNA stability, localisation, miRNA binding.
  9. Polyadenylation signal (AAUAAA or variants).

Genome Organization

Nuclear vs Mitochondrial Genomes

  • Nuclear: \sim3\times10^9\,\text{bp}, >20000 genes, 23 chromosomal pairs.
  • Mitochondrial: 16569 bp (~0.001 %), 37 genes (13 proteins for ETC, 24 ncRNAs), closed circular, intron-less, thousands of copies/cell.

Chromosomal DNA Packaging

  • Hierarchy
    • DNA double helix (~2\,\text{nm}) wraps around histone octamers forming nucleosomes "beads on a string" (~11\,\text{nm} fiber).
    • Nucleosomes coil → 30\,\text{nm} chromatin fiber.
    • Further looping/condensation → 300–700\,\text{nm} domains.
    • Fully condensed metaphase chromosome ~1400\,\text{nm} width.
  • Histones are nuclear-encoded; chromatin must dynamically unfold during replication/transcription.

Human Karyotype

  • 46 chromosomes: 22 autosome pairs + XX or XY sex chromosomes.
  • Chromosome sizes & banding patterns used diagnostically.

Human Genome Project (1990–2001)

  • Public (IHGSC) & private (Celera) efforts produced the draft sequence.
  • Composite reference—mosaic of multiple individuals.
  • Ongoing questions
    • Completeness, functional annotation, inter-individual variation.
  • Bioinformatics tools (e.g., BLAST) essential for sequence comparison & annotation.

Content Breakdown (≈3200 Mb)

  • Genes (exons): ≈48 Mb (≈1.5 % of genome).
  • Introns + UTRs: ≈1.2 Gb.
  • Interspersed repeats: ≈1.4 Gb.
  • Other intergenic: ≈600 Mb.
    • LINEs: ≈640 Mb.
    • SINEs (incl. ALUs): ≈420 Mb.
    • LTR elements: ≈250 Mb.
    • DNA transposons: ≈90 Mb.
    • Microsatellites: ≈90 Mb.

Transposable Elements

Retrotransposons

  • Share ancestry with retroviruses (gag, pol, env, LTRs).
  • Encode reverse transcriptase—copy-and-paste via RNA intermediate but cannot exit cell.

LINEs (Long INterspersed Elements)

  • 6–8 kb, autonomous (encode own RT & endonuclease).
  • Families LINE-1 (active), LINE-2, LINE-3.

SINEs (Short INterspersed Elements)

  • <500 bp, non-autonomous; hijack LINE machinery.
  • ALU: primate-specific SINE (~1 per 3 kb; >80 M years old); useful molecular clock.

Biological Impact

  • ~50 % of human genome is transposon-derived.
  • Can cause insertional mutagenesis, unequal crossover, but most are inactive today.

Microsatellites (Simple Sequence Repeats)

  • 1–15 bp motifs repeated 2–50× in tandem (dinucleotides most common).
  • Polymerase slippage → high mutation rate (expansion/contraction).
  • Can reside in coding regions but usually avoided.
  • Useful for genetic mapping, forensics (e.g., CODIS system).

Non-Coding RNA (ncRNA) Genes

  • Functional RNAs transcribed but not translated.
  • Major classes & approximate copy numbers (Esteller 2011):
    • miRNA (≈1424): 19–24 nt, post-transcriptional gene silencing.
    • piRNA (≈23439): 26–31 nt, repress transposons, direct DNA methylation.
    • tiRNA (>5000): 17–18 nt, near transcription start sites, regulatory?
    • snoRNA (>300): 60–300 nt, rRNA base modification.
    • snRNA: splicing machinery components.
    • lncRNA (>4000 subclasses): >200 nt, chromatin remodeling, X-inactivation, imprinting, mRNA stability.

Pseudogenes

  • Defunct relatives of genes; arise by duplication or retrotransposition.
  • Types
    • Gene fragments (single/multi-exon).
    • Unprocessed (whole gene incl. introns; often mutated splice sites).
    • Processed pseudogenes (cDNA copies re-inserted; lack introns, often flanked by direct repeats).
  • Provide raw material for evolution; can regulate cognate genes via competing RNA mechanisms.

Protein-Coding Genes

  • Although only ≈1.5 % of genome, heavily studied.
  • Copy number
    • Single-copy (e.g., β-globin).
    • Multicopy clusters (e.g., HLA class I).
  • Evolution
    • Gene families via duplication/divergence.
    • Superfamilies share conserved domains (e.g., Immunoglobulin-superfamily).
  • Identification in silico by aligning mRNAs/cDNAs to genomic DNA (EST projects, GenBank).
  • Standardized nomenclature crucial for data sharing.

Overlap & Orientation

  • Genes frequently overlap, reside on opposite strands, or embed within introns of larger genes; pseudogenes intermingle with functional loci—complicates annotation.

“Average” Gene Statistics (IHGSC 2001)

  • Size range: 2 kb → 2 Mb (huge variability).
  • Protein length: broad distribution.
  • UTR lengths: 3' UTRs generally longer than 5'.
  • Alternate first exons common—promoter diversity.

Alternative Splicing & Proteome Size

  • Only 20–25 k genes yet estimated 50–100 k proteins; explanatory factor: >60 % of multi-exon genes undergo alternative splicing.

Example: Cell-Surface Receptor Architecture

  • Typical domain layout
    1. N-terminal leader (signal peptide) ~20 aa.
    2. Variable extracellular domains (e.g., Ig-like).
    3. Stalk/linker.
    4. Transmembrane helix ± membrane anchor.
    5. Intracellular tail for signaling (ITAM, SH2/SH3-binding sites, etc.).
  • NKp44 receptor (6p21.1) follows such modular blueprint.

Laboratory & Diagnostic Connections

  • PCR, RT-PCR, qPCR: leverage replication & transcription principles.
  • Sequencing technologies employ chain-terminating nucleotides (Sanger) or nucleotide incorporation chemistry (pyrosequencing, semiconductor).
  • Splicing assays detect aberrant exon usage in genetic disease.
  • Genome browsers, BLAST, and annotation databases essential for variant interpretation.
  • Chromatin immunoprecipitation, DNase-seq, ATAC-seq probe packaging and regulatory landscapes.

Ethical & Practical Considerations

  • Genomic data raises privacy issues; variant interpretation impacts clinical decision-making.
  • Understanding of genome architecture informs gene therapy vector design & off-target risk assessment.
  • Transposon remnants present challenges for genome editing (CRISPR mismatch binding) but also offer tools (e.g., sleeping beauty transposase systems).