Sanger DNA Sequencing

Historical Development and Naming

• DNA sequencing first appeared in $1977$ , devised by Fredrik Sanger – hence the common term “Sanger sequencing.”
• It was the inaugural method to reveal the precise, ordered arrangement of nucleotides along a DNA strand.
• The technique directly extends principles already covered in earlier lectures on DNA replication and PCR: both rely on DNA polymerase–mediated in-vitro synthesis.

Core Principles Underlying Sanger Sequencing

Begin with a large, homogeneous pool of identical single-stranded DNA molecules (the template).
Carry out in-vitro replication using DNA polymerase, a primer, and the four standard deoxynucleoside triphosphates (dNTPs).
Intentionally sprinkle a small fraction of chain-terminating dideoxynucleoside triphosphates (ddNTPs) into the mix.
Whenever a ddNTP is inserted, synthesis stops – producing a fragment whose length pinpoints the nucleotide position where termination occurred.
Because insertion events are random, the reaction yields a ladder of fragments that collectively span every possible termination site.
Size-based separation (electrophoresis) orders the fragments so their terminal nucleotide can be read sequentially from smallest to largest.

Chemical Distinction: Deoxynucleotides vs. Dideoxynucleotides

• Standard dNTPs carry a hydroxyl group ( $\text{–OH}$ ) on the $3'$ carbon of the deoxyribose.
• ddNTPs replace that $3'$ hydroxyl with a single hydrogen ( $\text{–H}$ ).
• Without the $3'\text{–OH}$ , the incoming nucleotide cannot form the phosphodiester bond required for elongation – a permanent “stop sign.”
• Analogy: imagine building a LEGO tower. Normal bricks have studs (the $3'$ hydroxyl) so the next brick can attach; ddNTP bricks lack studs and end the tower.

Reaction Setup: Components and Optimal Ratios

Essential ingredients (all mixed in one tube during modern, automated runs):

Template strand – typically far longer than the small schematic shown in slides.
Oligonucleotide primer – anneals to the template, supplying a free $3'\text{–OH}$ for polymerase.
DNA polymerase – catalyzes addition of nucleotides.
Four standard dNTPs.
Four fluorescently labelled ddNTPs (ddATP, ddTTP, ddCTP, ddGTP) at low concentration.

Concentration rule of thumb:
$\text{[dNTP]} : \text{[ddNTP]} \approx 100:1 \text{ to } 300:1$
• Too many ddNTPs ⇒ fragments terminate prematurely – sequence too short.
• Too few ddNTPs ⇒ insufficient termination events – poor signal.

Chain Termination and Fragment Generation (Manual Perspective)

• Each incorporation event is probabilistic; therefore, every possible template position eventually becomes the terminal site in at least some fragments.
• Historically, four separate tubes (one for each ddNTP) were run, with radioactive labelling for detection. Modern fluorescence collapses this into one combined reaction.
• Result: a heterogeneous mixture of newly synthesized strands differing only in length and in the terminal fluorescent tag that reveals the identity of the terminating nucleotide.

Fluorescent Automated Sanger Sequencing Workflow

Combine all reagents in one tube.
Carry out thermal cycling (similar to PCR but typically fewer cycles) so extension and random termination occur.
Heat-denature duplexes to separate the original template from the labelled fragments.
Inject the fragments into a capillary electrophoresis system filled with a sieving polymer.
Voltage drives fragments; shorter pieces migrate faster.
As each fragment exits the capillary, a laser excites its fluorophore.
An emission detector records colour and time of arrival.

Colour code (instrument-dependent, illustrative):
• ddATP ⇒ green
• ddTTP ⇒ red
• ddCTP ⇒ blue
• ddGTP ⇒ black or yellow

Electrophoretic Separation and Laser Detection

• The detector converts sequential colour flashes into a chromatogram: peaks of distinct hues arranged left-to-right from smallest to largest fragment.
• Every successive peak is exactly one nucleotide longer than the previous, so reading the colours directly yields the sequence $5' \to 3'$ of the newly synthesized strand.
• Example readout (small excerpt):
– peak order: T (red) → C (blue) → A (green) → C (blue) → A (green) → G (black) → T (red)
– inferred sequence: TCACAGT

Data Output: Chromatograms & Sequence Reading

• Software prints the coloured trace along with per-base quality scores.
• The operator can export a FASTA file or manually verify ambiguous peaks.
• For known genes (e.g., $\text{COL1A1}$ ) the new read is aligned against a reference to spot mutations or polymorphisms.

Interpreting New vs. Known Sequences

• Unknown inserts demand de-novo interpretation of coding potential.
• Because DNA is double-stranded and the translation machinery reads triplets, any fragment can, in principle, be interpreted in $6$ separate reading frames:

Forward strand, frame $+1$ (starts at the first nucleotide).
Forward strand, frame $+2$ (starts at nucleotide $2$ ).
Forward strand, frame $+3$ (starts at nucleotide $3$ ).
Reverse-complement strand, frame $-1$ .
Reverse-complement strand, frame $-2$ .
Reverse-complement strand, frame $-3$ .
• Analysts scan each frame for an “open reading frame” (ORF) – a long stretch of sense codons bounded only distantly by a stop codon.
• Example: if frame $+1$ shows $\text{ATG}\,\text{ACG}\,\dots$ continuing for >100 codons before encountering $\text{TAA}$ , it is a strong ORF candidate.
• ORF determination helps predict protein-coding regions, splice sites, or locate mutations that could create premature stops.

Practical, Ethical, and Philosophical Implications

• Sanger sequencing remains the gold standard for validating NGS discoveries because of its low error rate ( (< 0.001) per base ).
• Clinical genetics employs Sanger confirmation to ensure diagnostic accuracy – a single wrong base call can have major consequences for patient management.
• Ethically, the ability to pinpoint pathogenic variants heightens questions of privacy, informed consent, and potential discrimination.
• Philosophically, the method exemplifies how a simple chemical tweak (loss of one hydroxyl group) can unlock profound biological insights.

Connections to Earlier Coursework

• Reinforces fundamental DNA replication themes ( $5' \to 3'$ synthesis, primer requirement, polymerase fidelity ).
• Parallels PCR in relying on thermal cycling and primer design, yet diverges in purpose (quantitative amplification vs. qualitative sequence determination).
• Builds on electrophoresis concepts introduced in prior labs: charge-to-mass ratio, migration in a gel matrix, and size-dependent separation.

Key Numerical References & Equations

• Year of invention: $1977$ .
• Preferred dNTP : ddNTP ratio: $100:1$ to $300:1$ .
• Six potential reading frames: $6$ .
• Error rate of automated Sanger: <10^{-3} per nucleotide.

Summary of Workflow in Pseudocode-Style Steps

Denature template → obtain single strands.
Anneal primer.
Add polymerase + dNTPs + fluorescent ddNTPs (low concentration).
Thermocycle to extend and randomly terminate.
Denature products.
Capillary electrophoresis.
Laser excitation + fluorescence detection.
Convert colour peaks → base calls → FASTA sequence.
Align or translate sequence as required.

Typical Troubleshooting & Optimization Tips (Minor Points)

• Weak signal? – raise ddNTP fluorophore concentration slightly but stay within the $100:1$ – $300:1$ window.
• Compressed (overlapping) peaks? – lower template concentration or run at higher capillary voltage.
• Ambiguous base at long stretches of single nucleotide repeats – confirm with reverse primer or redesign primer further upstream.