Molecular Pathology – Molecular & Cell Biology (Comprehensive Study Notes)

Lecture Outcomes

By the end of this session you should be able to:
- Describe the structures of DNA and RNA.
- Explain the mechanism and key enzymes of DNA replication.
- Identify the basic structural elements of a gene.
- Articulate the “central dogma” (DNA → RNA → Protein) including transcription and translation mechanics.
- Outline higher-order DNA packaging and overall chromosome anatomy.

Nucleic Acids: DNA vs RNA

Found in all living cells and many viruses; serve as the cell’s information storage & retrieval system.
Both are polymers of nucleotides, but differ in:
- Sugar: DNA uses deoxyribose, RNA uses ribose (has a 2'\text{–OH} group).
- Strand state: DNA is typically double-stranded; RNA is usually single-stranded but can fold into complex secondary/tertiary structures.
- Bases: Thymine (T) is exclusive to DNA; Uracil (U) replaces T in RNA.
Functional overview
- DNA: long-term, heritable information.
- RNA: diverse roles—coding (mRNA), adaptor (tRNA), catalytic/structural (rRNA), regulatory (miRNA, lncRNA, etc.).

Nucleotides, Nucleosides & Nitrogenous Bases

Definitions
- BASE + SUGAR ⇒ Nucleoside.
- BASE + SUGAR + PHOSPHATE ⇒ Nucleotide.
Nitrogenous bases
- Purines (double-ring): Adenine (A), Guanine (G).
- Pyrimidines (single-ring): Cytosine (C), Thymine (T—DNA only), Uracil (U—RNA only).
Coding capacity derives from the specific order (primary structure) of these bases.

Watson–Crick Base-Pairing Rules

Purine always pairs with a pyrimidine.
- \text{A} \leftrightarrow \text{T} (or \text{U} in RNA) via 2 H-bonds.
- \text{C} \leftrightarrow \text{G} via 3 H-bonds (stronger interaction).
Complementarity underpins replication, transcription, PCR, and many diagnostic assays.

DNA Structure

Primary structure: linear nucleotide sequence written 5' \rightarrow 3'.
Secondary structure: antiparallel double helix (two strands run 5' \rightarrow 3' in opposite directions).
- One complete helical turn contains 10\text{ bp} and spans 3.4\,\text{nm}; helix diameter ≈ 2.37\,\text{nm}.
Higher-order packing (see Chromosomal Packaging section) allows >2\,\text{m} of DNA to fit inside a nucleus.

RNA Structure

Primary: single strand listed 5' \rightarrow 3'.
Secondary: intramolecular base-pairing (Watson–Crick) creates hairpins, loops, and more.
- Example: tRNA adopts a cloverleaf secondary structure (detailed later in course).
Key chemical difference (2′OH) enables catalytic activity (ribozymes) and affects stability (RNA less stable than DNA).

Functional Classes of RNA

mRNA (messenger): carries coding information; template for protein synthesis.
tRNA (transfer): at least one per amino acid; delivers amino acids to ribosome.
rRNA (ribosomal): structural & catalytic core of ribosomes.
ncRNAs (non-coding): huge family including snRNA, snoRNA, hnRNA, miRNA, lncRNA, piRNA, etc.—critical for regulation, RNA processing, chromatin state, and genome defense.

DNA Replication

Timing: S phase of the cell cycle.
Overall: semiconservative—each daughter helix contains one parental and one newly synthesized strand.
Core protein machinery
- Helicase: unwinds double helix.
- Single-stranded DNA-binding proteins (SSB): prevent re-annealing.
- Primase: synthesizes short RNA primers to provide free 3'\text{–OH}.
- DNA polymerase III (main prokaryotic enzyme; analogous Pol δ/ε in eukaryotes): elongates DNA 5' \rightarrow 3', proof-reads (3′→5′ exonuclease activity).
- DNA polymerase I (prokaryotes) / RNase H & Pol δ (eukaryotes): primer removal & gap filling.
- Ligase: seals nicks, joins Okazaki fragments on lagging strand.
Replication bubble/forks
- Each origin fires bi-directionally, generating two forks.
- Leading strand: synthesized continuously toward fork.
- Lagging strand: synthesized discontinuously, forming \sim100–200 nt (eukaryotes) Okazaki fragments away from fork.
Chemistry of elongation
- Incoming dNTP’s \alpha-phosphate forms phosphodiester bond with 3'\text{–OH} of growing strand.
- Release & hydrolysis of pyrophosphate (PP(_i)) drives reaction energetically.
Multiple bubbles replicate eukaryotic chromosomes simultaneously to accelerate S-phase.

Laboratory Relevance

Fundamental to PCR amplification, DNA sequencing (Sanger, pyrosequencing, semiconductor), cloning, site-directed mutagenesis.
Standard notation of strands and primers uses 5' \rightarrow 3' orientation.

Transcription (DNA → pre-mRNA)

Enzymes & factors
- RNA polymerase II (for mRNA), general transcription factors (TFIIA, TFIIB, etc.), template DNA, NTPs.
Steps
1. Initiation: polymerase + TFs assemble at promoter (TATA or TATA-less); double helix locally unwound forming a transcription bubble.
2. Elongation: RNA pol moves along template (antisense) strand synthesizing RNA 5' \rightarrow 3' with base complementarity (A ↔ U, C ↔ G).
3. Termination: polymerase disengages at termination signals; RNA released.
4. Co-/post-transcriptional processing:
- 5' capping, 3' poly-A tailing (\text{AA…A}_{\sim200}), splicing, RNA editing or chemical modifications.

Splicing Mechanics

pre-mRNA exported to spliceosome; introns removed, exons ligated.
Critical cis-elements
- Splice donor (GU) & acceptor (AG) dinucleotides at intron boundaries.
- Branch point adenine, polypyrimidine tract.
- Exonic/intronic splicing enhancers or silencers bound by SR proteins or hnRNPs; tissue-specific combinations generate diversity.
SR-protein expression profiles vary across tissues → regulates exon inclusion/skipping.

Alternative Splicing

Modes: exon skipping, intron retention, alternative 5' donor or 3' acceptor sites, mutually exclusive exons, alternative promoters/poly-A sites.
Outcome: multiple protein isoforms from one gene; expands proteome (>10 isoforms for some genes).
Clinical/lab relevance: RT-PCR to study splice defects, expression profiling, biomarker discovery.

Translation (mRNA → Polypeptide)

Process occurs on ribosomes in cytoplasm.
Steps
1. Initiation: small ribosomal subunit binds mRNA cap or Shine-Dalgarno (prokaryotes), scans to AUG; Met-tRNA (initiator) + initiation factors assemble large subunit.
2. Elongation: aminoacyl-tRNAs enter A-site, peptide bond catalyzed in P-site, ribosome translocates; governed by EF-Tu/EF-G (prok.) or eEF1/eEF2 (euk.).
3. Termination: stop codon (UAA, UAG, UGA) recruits release factors; ribosome disassembles, polypeptide released.
Polysomes: multiple ribosomes concurrently translate a single mRNA, boosting output.
Aminoacyl-tRNA synthetases “charge” tRNAs; ATP-dependent, high fidelity.

Central Dogma Integration & Regulation

Information flow: DNA \xrightarrow{\text{transcription}} RNA \xrightarrow{\text{translation}} Protein.
Mutations can lead to
- No effect (silent/redundant systems).
- Loss-of-function → disease or apoptosis.
- Gain-of-function or dominant-negative effects.
Phenotypic outcome depends on protein role, redundancy, expression context.

Gene as a Collection of Binding Sites

Gene expression controlled by binding of RNAs/proteins at promoters, enhancers, silencers, splice sites, UTR motifs, etc.

General Structure of a Protein-Coding Gene

Promoter (RNA pol/TF binding) – TATA or TATA-less.
5' Untranslated Region (UTR) – influences translation initiation.
Start codon \text{ATG}.
Exons & introns (coding/non-coding sequences).
Splice donor/acceptor sites (usual \text{GT/AG}; rare \text{AT/AC}).
Splice enhancers/silencers (exonic/intronic).
Stop codon \text{TAA, TAG, TGA}.
3' UTR – mRNA stability, localisation, miRNA binding.
Polyadenylation signal (AAUAAA or variants).

Genome Organization

Nuclear vs Mitochondrial Genomes

Nuclear: \sim3\times10^9\,\text{bp}, >20000 genes, 23 chromosomal pairs.
Mitochondrial: 16569 bp (~0.001 %), 37 genes (13 proteins for ETC, 24 ncRNAs), closed circular, intron-less, thousands of copies/cell.

Chromosomal DNA Packaging

Hierarchy
- DNA double helix (~2\,\text{nm}) wraps around histone octamers forming nucleosomes "beads on a string" (~11\,\text{nm} fiber).
- Nucleosomes coil → 30\,\text{nm} chromatin fiber.
- Further looping/condensation → 300–700\,\text{nm} domains.
- Fully condensed metaphase chromosome ~1400\,\text{nm} width.
Histones are nuclear-encoded; chromatin must dynamically unfold during replication/transcription.

Human Karyotype

46 chromosomes: 22 autosome pairs + XX or XY sex chromosomes.
Chromosome sizes & banding patterns used diagnostically.

Human Genome Project (1990–2001)

Public (IHGSC) & private (Celera) efforts produced the draft sequence.
Composite reference—mosaic of multiple individuals.
Ongoing questions
- Completeness, functional annotation, inter-individual variation.
Bioinformatics tools (e.g., BLAST) essential for sequence comparison & annotation.

Content Breakdown (≈3200 Mb)

Genes (exons): ≈48 Mb (≈1.5 % of genome).
Introns + UTRs: ≈1.2 Gb.
Interspersed repeats: ≈1.4 Gb.
Other intergenic: ≈600 Mb.
- LINEs: ≈640 Mb.
- SINEs (incl. ALUs): ≈420 Mb.
- LTR elements: ≈250 Mb.
- DNA transposons: ≈90 Mb.
- Microsatellites: ≈90 Mb.

Transposable Elements

Retrotransposons

Share ancestry with retroviruses (gag, pol, env, LTRs).
Encode reverse transcriptase—copy-and-paste via RNA intermediate but cannot exit cell.

LINEs (Long INterspersed Elements)

6–8 kb, autonomous (encode own RT & endonuclease).
Families LINE-1 (active), LINE-2, LINE-3.

SINEs (Short INterspersed Elements)

<500 bp, non-autonomous; hijack LINE machinery.
ALU: primate-specific SINE (~1 per 3 kb; >80 M years old); useful molecular clock.

Biological Impact

~50 % of human genome is transposon-derived.
Can cause insertional mutagenesis, unequal crossover, but most are inactive today.

Microsatellites (Simple Sequence Repeats)

1–15 bp motifs repeated 2–50× in tandem (dinucleotides most common).
Polymerase slippage → high mutation rate (expansion/contraction).
Can reside in coding regions but usually avoided.
Useful for genetic mapping, forensics (e.g., CODIS system).

Non-Coding RNA (ncRNA) Genes

Functional RNAs transcribed but not translated.
Major classes & approximate copy numbers (Esteller 2011):
- miRNA (≈1424): 19–24 nt, post-transcriptional gene silencing.
- piRNA (≈23439): 26–31 nt, repress transposons, direct DNA methylation.
- tiRNA (>5000): 17–18 nt, near transcription start sites, regulatory?
- snoRNA (>300): 60–300 nt, rRNA base modification.
- snRNA: splicing machinery components.
- lncRNA (>4000 subclasses): >200 nt, chromatin remodeling, X-inactivation, imprinting, mRNA stability.

Pseudogenes

Defunct relatives of genes; arise by duplication or retrotransposition.
Types
- Gene fragments (single/multi-exon).
- Unprocessed (whole gene incl. introns; often mutated splice sites).
- Processed pseudogenes (cDNA copies re-inserted; lack introns, often flanked by direct repeats).
Provide raw material for evolution; can regulate cognate genes via competing RNA mechanisms.

Protein-Coding Genes

Although only ≈1.5 % of genome, heavily studied.
Copy number
- Single-copy (e.g., β-globin).
- Multicopy clusters (e.g., HLA class I).
Evolution
- Gene families via duplication/divergence.
- Superfamilies share conserved domains (e.g., Immunoglobulin-superfamily).
Identification in silico by aligning mRNAs/cDNAs to genomic DNA (EST projects, GenBank).
Standardized nomenclature crucial for data sharing.

Overlap & Orientation

Genes frequently overlap, reside on opposite strands, or embed within introns of larger genes; pseudogenes intermingle with functional loci—complicates annotation.

“Average” Gene Statistics (IHGSC 2001)

Size range: 2 kb → 2 Mb (huge variability).
Protein length: broad distribution.
UTR lengths: 3' UTRs generally longer than 5'.
Alternate first exons common—promoter diversity.

Alternative Splicing & Proteome Size

Only 20–25 k genes yet estimated 50–100 k proteins; explanatory factor: >60 % of multi-exon genes undergo alternative splicing.

Example: Cell-Surface Receptor Architecture

Typical domain layout
1. N-terminal leader (signal peptide) ~20 aa.
2. Variable extracellular domains (e.g., Ig-like).
3. Stalk/linker.
4. Transmembrane helix ± membrane anchor.
5. Intracellular tail for signaling (ITAM, SH2/SH3-binding sites, etc.).
NKp44 receptor (6p21.1) follows such modular blueprint.

Laboratory & Diagnostic Connections

PCR, RT-PCR, qPCR: leverage replication & transcription principles.
Sequencing technologies employ chain-terminating nucleotides (Sanger) or nucleotide incorporation chemistry (pyrosequencing, semiconductor).
Splicing assays detect aberrant exon usage in genetic disease.
Genome browsers, BLAST, and annotation databases essential for variant interpretation.
Chromatin immunoprecipitation, DNase-seq, ATAC-seq probe packaging and regulatory landscapes.

Ethical & Practical Considerations

Genomic data raises privacy issues; variant interpretation impacts clinical decision-making.
Understanding of genome architecture informs gene therapy vector design & off-target risk assessment.
Transposon remnants present challenges for genome editing (CRISPR mismatch binding) but also offer tools (e.g., sleeping beauty transposase systems).