Peptide Sequencing & Proteomics – Core Vocabulary
Historical Evolution of Protein/Peptide Sequencing
❖ 1970-80s: Protein sequencing primarily relied on Edman degradation.
Required large amounts of purified protein and a free N-terminus.
Failed for N-terminally blocked/acetylated proteins.
❖ 1990s: Mass spectrometry (MS) displaced Edman degradation.
Detects ionized biomolecules in vacuum; fragments peptides within seconds vs. hours–days for Edman.
Handles tiny sample amounts, heterogeneous mixtures, blocked termini, post-translational modifications (PTMs).
❖ 2000s: MS becomes centerpiece of proteomics.
Routine identification of single gel spots/bands; advanced global screens (MudPIT, GeLC, shotgun).
Need for biologists to understand MS principles to avoid over-interpretation (e.g., mistaking minor contaminants for main band).
Core Rationale: Why Sequence Peptides, Not Intact Proteins
MS sensitivity and fragmentation efficiency are far higher for peptides (≤ 20 aa) than full proteins.
Proteins vary in solubility, stability, modifications; peptides have more uniform physico-chemical behavior.
Digestion removes issues of detergent incompatibility and membrane-protein insolubility.
Reduced complexity allows database matching; however, partial coverage misses complete PTM & processing info.
Specialized "top-down" FTICR methods can sequence intact proteins, but remain niche.
Proteomic Workflow Overview (Fig 1 Analogue)
Cell/tissue source → Sample prep (SDS–PAGE, 2-D gels, fractionation) → Protein digestion (trypsin, Lys-C, Asp-N, Glu-C) → Peptide separation (1-D/2-D LC, ion-exchange) → Ionization (Electrospray, MALDI) → Mass analysis (Quadrupole, TOF, Ion-trap, FTICR, hybrids) → Data analysis (PeptideSearch, Sequest, Mascot) → Biological interpretation.
Protein Digestion Specifics
Trypsin: Cleaves C-terminal to Arg/Lys; generates peptides in ideal mass range with basic C-terminus → rich y-ion spectra.
Lys-C: Even more stable; useful prior to trypsin under 8\,\text{M} urea.
Asp-N / Glu-C: Complementary specificity, lower activity.
Non-specific enzymes avoided; create excessive overlapping spectra.
Peptide Separation by Microscale Capillary HPLC
Inner diameter 50{-}150\,\mu\text{m}, reversed-phase C18.
Elution via increasing organic gradient; order by hydrophobicity.
Very hydrophilic peptides may elute in void; very hydrophobic may stick.
Nano-flow rates \sim100\,\text{nL·min}^{-1}; peak width 10{-}60\,\text{s}.
Options:
GeLC–MS: Slice SDS-PAGE lane; digest each slice → higher dynamic range & known M_w context.
MudPIT: 2-D LC (strong cation exchange → RP) on peptide level.
Ionization Techniques (Box 1)
Electrospray Ionization (ESI)
Spray needle at \sim2\,\text{kV} potential creates charged droplets → solvent evaporation → ion desorption.
Produces multiply-protonated ions, commonly doubly charged for tryptic peptides.
Matrix-Assisted Laser Desorption/Ionization (MALDI)
Peptides co-crystallized in aromatic acid matrix; laser pulse yields mainly singly protonated ions.
Off-line coupling: LC fractions spotted on target for automated MALDI–TOF/TOF or MALDI–ion-trap sequencing.
Mass Analyzers & Resolution
Quadrupole (Q): Filters ions by stabilizing trajectories using sinusoidal RF/DC; sequential scan.
Time-of-Flight (TOF): Ions accelerated to equal kinetic energy; flight time ∝ \sqrt{m/z} → lighter ions arrive earlier.
Quadrupole Ion Trap (3-D or Linear): Traps ions in oscillating field; can isolate, fragment (MSⁿ), then eject.
FTICR (Penning trap): Ions orbit in high B-field; frequency→mass via Fourier transform. Resolution >100{,}000, mass error few ppm.
m/z Calculation Example
Doubly protonated peptide mass M = 1232.55\,\text{Da}.
\frac{1232.55 + 2\times1.0073}{2} = 617.28 (observed m/z).Isotope spacing =1/z → 0.5\,\text{Th} spacing confirms charge 2+.
Tandem MS & Peptide Fragmentation (Box 2)
Workflow: Survey MS scan → isolate top N precursors → collision-induced dissociation (CID) → MS² spectra.
Ion types:
b_m: charge on N-term fragment.
y_{n-m}: charge on C-term fragment.
am = bm-\text{CO} (−27.9949\,\text{Da}).
Proline (N-term) & Aspartate (C-term) bonds are labile → intense ions.
Multi-stage MSⁿ (MS³…) now feasible in linear traps for deeper sequencing.
De Novo vs. Database-Driven Sequencing
De novo: Interpret mass gaps; ambiguous when spectrum incomplete.
Database Searching converts problem to pattern matching.
Vast reduction of solution space as only biological sequences considered.
Major Algorithms (Box 3)
Peptide Sequence Tags (PeptideSearch)
Short internal sequence + masses to termini → unique DB hit.
Sequest
Correlates experimental vs. theoretical spectra by cross-correlation.
Mascot
Probability-based; matches highest-intensity ions first → score = -10\log_{10}(P).
Others: Sonar, ProteinProspector, graph-theory approaches.
Statistical Validation of Peptide/Protein IDs
Reported as expectation/probability scores.
Use fully tryptic peptides unless strong evidence for semi-tryptic.
False-positive estimation via decoy databases (reversed/randomized sequences).
Two-component score distribution: low-score (random) vs. high-score (true) → choose cut-off for desired \le!1\% FDR.
Protein probability combines peptide probabilities; caution with very large proteins (many theoretical peptides).
Single-peptide IDs only accepted with very high mass accuracy & manual spectrum validation (Box 4).
Manual Spectrum Validation Heuristics (Box 4)
Majority of intense peaks above precursor should form continuous y-series.
Check characteristic labile cleavages (Pro, Asp) & satellite losses (e.g., -98\,\text{Da} for \text{H}3PO4, -64\,\text{Da} for \text{CH}_3\text{SOH}).
Consider same-charge fragment ions below precursor (e.g., doubly charged y-ions).
Quantification Strategies
Absolute (AQUA-like)
Spike synthetic isotopically labelled peptides of known amount; compare extracted ion currents.
Averaging top 3 intense peptides per protein gives estimate within ±4-fold without standards.
Relative Quantification via Stable Isotopes
Principle: heavy/light forms co-elute → peak ratio reflects abundance ratio.
Metabolic Labelling
SILAC: culture cells with ^{13}\text{C}/^{15}\text{N} Arg/Lys; mix cell lysates before processing.
\ge3 Da shift required to separate isotope clusters.
Post-Harvest Chemical Labelling
ICAT: thiol-specific tag with biotin + light/heavy linker (FIG 4b); enrich cysteine peptides.
Other amine-reactive tags, deuterated reagents, ^{18}\text{O} exchange.
Accuracy limited by resolution & S/N; replicate/label-swap experiments recommended.
Applications & Case Studies
Complex/Interactome Mapping: Immunoprecipitation + MS identifies protein networks (Refs 51-52). Stable-isotope pull-downs capture transient, phosphorylation-dependent binding (Refs 56-58).
Organelle Proteomics: Protein-correlation profiling distinguishes genuine organellar proteins via fractionation profiles (Ref 59).
Expression Proteomics / Biomarker Discovery: Whole-lysate quantification (SILAC, ICAT) seeks differential expression; challenges include dynamic range & data noise.
Top-Down FTICR: Partial sequences & PTM mapping on intact proteins.
Challenges & Common Pitfalls
Under-appreciating contaminants: keratins, minor co-migrating proteins.
Over-interpreting long protein lists without statistical confidence.
Missing low-abundance peptides due to ion suppression/co-elution.
Database redundancy vs. minimalism: conflicting isoform assignments.
Incomplete PTM coverage owing to limited sequence coverage.
Key Terms & Numerical Reminders
Dalton (Da): 1\,\text{Da} = 1.6605\times10^{-27}\,\text{kg}.
Thomson (Th): Proposed unit for m/z scale.
Total Ion Current (TIC): sum of all signal intensities per scan.
Extracted Ion Chromatogram (XIC): intensity trace of single m/z across LC run.
Mass Resolution: R = \frac{m/z}{\text{FWHM}}; TOF ≈10{,}000; FTICR >100{,}000.
Isotope Spacing: \Delta (m/z)=1/z.
Ethical / Practical Implications
Proper statistical validation prevents publication of unreliable proteomes.
SILAC avoids radioactivity and enables in-vivo dynamic studies without additional chemical manipulations.
ICAT selects for cysteine-containing proteins; researchers must report potential bias against cysteine-free proteins.
Sample handling: use gloves/lab coats to minimize keratin contamination.
Connections to Foundational Principles & Other ‘Omics’
Complementarity with transcriptomics: protein abundance correlates poorly with mRNA (Ref 60) → proteomics indispensable.
Systems biology integrates MS-based proteomics, mRNA arrays, imaging for holistic cellular maps (Refs 62-67).
Future Directions
Higher MS speed & resolution promise near-complete proteome coverage (accurate-mass-tag approaches).
Improved statistical tools (machine learning) & community standards for FDR reporting.
Clinical translation: SELDI patterns under scrutiny; need robust bioinformatics for biomarker validation.