KS

Peptide Sequencing & Proteomics – Core Vocabulary

Historical Evolution of Protein/Peptide Sequencing

  • ❖ 1970-80s: Protein sequencing primarily relied on Edman degradation.

    • Required large amounts of purified protein and a free N-terminus.

    • Failed for N-terminally blocked/acetylated proteins.

  • ❖ 1990s: Mass spectrometry (MS) displaced Edman degradation.

    • Detects ionized biomolecules in vacuum; fragments peptides within seconds vs. hours–days for Edman.

    • Handles tiny sample amounts, heterogeneous mixtures, blocked termini, post-translational modifications (PTMs).

  • ❖ 2000s: MS becomes centerpiece of proteomics.

    • Routine identification of single gel spots/bands; advanced global screens (MudPIT, GeLC, shotgun).

    • Need for biologists to understand MS principles to avoid over-interpretation (e.g., mistaking minor contaminants for main band).

Core Rationale: Why Sequence Peptides, Not Intact Proteins

  • MS sensitivity and fragmentation efficiency are far higher for peptides (≤ 20 aa) than full proteins.

  • Proteins vary in solubility, stability, modifications; peptides have more uniform physico-chemical behavior.

  • Digestion removes issues of detergent incompatibility and membrane-protein insolubility.

  • Reduced complexity allows database matching; however, partial coverage misses complete PTM & processing info.

  • Specialized "top-down" FTICR methods can sequence intact proteins, but remain niche.

Proteomic Workflow Overview (Fig 1 Analogue)

  • Cell/tissue source → Sample prep (SDS–PAGE, 2-D gels, fractionation) → Protein digestion (trypsin, Lys-C, Asp-N, Glu-C) → Peptide separation (1-D/2-D LC, ion-exchange) → Ionization (Electrospray, MALDI) → Mass analysis (Quadrupole, TOF, Ion-trap, FTICR, hybrids) → Data analysis (PeptideSearch, Sequest, Mascot) → Biological interpretation.

Protein Digestion Specifics

  • Trypsin: Cleaves C-terminal to Arg/Lys; generates peptides in ideal mass range with basic C-terminus → rich y-ion spectra.

  • Lys-C: Even more stable; useful prior to trypsin under 8\,\text{M} urea.

  • Asp-N / Glu-C: Complementary specificity, lower activity.

  • Non-specific enzymes avoided; create excessive overlapping spectra.

Peptide Separation by Microscale Capillary HPLC

  • Inner diameter 50{-}150\,\mu\text{m}, reversed-phase C18.

  • Elution via increasing organic gradient; order by hydrophobicity.

    • Very hydrophilic peptides may elute in void; very hydrophobic may stick.

  • Nano-flow rates \sim100\,\text{nL·min}^{-1}; peak width 10{-}60\,\text{s}.

  • Options:

    • GeLC–MS: Slice SDS-PAGE lane; digest each slice → higher dynamic range & known M_w context.

    • MudPIT: 2-D LC (strong cation exchange → RP) on peptide level.

Ionization Techniques (Box 1)

Electrospray Ionization (ESI)
  • Spray needle at \sim2\,\text{kV} potential creates charged droplets → solvent evaporation → ion desorption.

  • Produces multiply-protonated ions, commonly doubly charged for tryptic peptides.

Matrix-Assisted Laser Desorption/Ionization (MALDI)
  • Peptides co-crystallized in aromatic acid matrix; laser pulse yields mainly singly protonated ions.

  • Off-line coupling: LC fractions spotted on target for automated MALDI–TOF/TOF or MALDI–ion-trap sequencing.

Mass Analyzers & Resolution

  • Quadrupole (Q): Filters ions by stabilizing trajectories using sinusoidal RF/DC; sequential scan.

  • Time-of-Flight (TOF): Ions accelerated to equal kinetic energy; flight time ∝ \sqrt{m/z} → lighter ions arrive earlier.

  • Quadrupole Ion Trap (3-D or Linear): Traps ions in oscillating field; can isolate, fragment (MSⁿ), then eject.

  • FTICR (Penning trap): Ions orbit in high B-field; frequency→mass via Fourier transform. Resolution >100{,}000, mass error few ppm.

m/z Calculation Example

  • Doubly protonated peptide mass M = 1232.55\,\text{Da}.
    \frac{1232.55 + 2\times1.0073}{2} = 617.28 (observed m/z).

  • Isotope spacing =1/z → 0.5\,\text{Th} spacing confirms charge 2+.

Tandem MS & Peptide Fragmentation (Box 2)

  • Workflow: Survey MS scan → isolate top N precursors → collision-induced dissociation (CID) → MS² spectra.

  • Ion types:

    • b_m: charge on N-term fragment.

    • y_{n-m}: charge on C-term fragment.

    • am = bm-\text{CO} (−27.9949\,\text{Da}).

  • Proline (N-term) & Aspartate (C-term) bonds are labile → intense ions.

  • Multi-stage MSⁿ (MS³…) now feasible in linear traps for deeper sequencing.

De Novo vs. Database-Driven Sequencing

  • De novo: Interpret mass gaps; ambiguous when spectrum incomplete.

  • Database Searching converts problem to pattern matching.

    • Vast reduction of solution space as only biological sequences considered.

Major Algorithms (Box 3)
  1. Peptide Sequence Tags (PeptideSearch)

    • Short internal sequence + masses to termini → unique DB hit.

  2. Sequest

    • Correlates experimental vs. theoretical spectra by cross-correlation.

  3. Mascot

    • Probability-based; matches highest-intensity ions first → score = -10\log_{10}(P).

  • Others: Sonar, ProteinProspector, graph-theory approaches.

Statistical Validation of Peptide/Protein IDs

  • Reported as expectation/probability scores.

  • Use fully tryptic peptides unless strong evidence for semi-tryptic.

  • False-positive estimation via decoy databases (reversed/randomized sequences).

  • Two-component score distribution: low-score (random) vs. high-score (true) → choose cut-off for desired \le!1\% FDR.

  • Protein probability combines peptide probabilities; caution with very large proteins (many theoretical peptides).

  • Single-peptide IDs only accepted with very high mass accuracy & manual spectrum validation (Box 4).

Manual Spectrum Validation Heuristics (Box 4)

  • Majority of intense peaks above precursor should form continuous y-series.

  • Check characteristic labile cleavages (Pro, Asp) & satellite losses (e.g., -98\,\text{Da} for \text{H}3PO4, -64\,\text{Da} for \text{CH}_3\text{SOH}).

  • Consider same-charge fragment ions below precursor (e.g., doubly charged y-ions).

Quantification Strategies

Absolute (AQUA-like)
  • Spike synthetic isotopically labelled peptides of known amount; compare extracted ion currents.

  • Averaging top 3 intense peptides per protein gives estimate within ±4-fold without standards.

Relative Quantification via Stable Isotopes
  • Principle: heavy/light forms co-elute → peak ratio reflects abundance ratio.

  • Metabolic Labelling

    • SILAC: culture cells with ^{13}\text{C}/^{15}\text{N} Arg/Lys; mix cell lysates before processing.

    • \ge3 Da shift required to separate isotope clusters.

  • Post-Harvest Chemical Labelling

    • ICAT: thiol-specific tag with biotin + light/heavy linker (FIG 4b); enrich cysteine peptides.

    • Other amine-reactive tags, deuterated reagents, ^{18}\text{O} exchange.

  • Accuracy limited by resolution & S/N; replicate/label-swap experiments recommended.

Applications & Case Studies

  • Complex/Interactome Mapping: Immunoprecipitation + MS identifies protein networks (Refs 51-52). Stable-isotope pull-downs capture transient, phosphorylation-dependent binding (Refs 56-58).

  • Organelle Proteomics: Protein-correlation profiling distinguishes genuine organellar proteins via fractionation profiles (Ref 59).

  • Expression Proteomics / Biomarker Discovery: Whole-lysate quantification (SILAC, ICAT) seeks differential expression; challenges include dynamic range & data noise.

  • Top-Down FTICR: Partial sequences & PTM mapping on intact proteins.

Challenges & Common Pitfalls

  • Under-appreciating contaminants: keratins, minor co-migrating proteins.

  • Over-interpreting long protein lists without statistical confidence.

  • Missing low-abundance peptides due to ion suppression/co-elution.

  • Database redundancy vs. minimalism: conflicting isoform assignments.

  • Incomplete PTM coverage owing to limited sequence coverage.

Key Terms & Numerical Reminders

  • Dalton (Da): 1\,\text{Da} = 1.6605\times10^{-27}\,\text{kg}.

  • Thomson (Th): Proposed unit for m/z scale.

  • Total Ion Current (TIC): sum of all signal intensities per scan.

  • Extracted Ion Chromatogram (XIC): intensity trace of single m/z across LC run.

  • Mass Resolution: R = \frac{m/z}{\text{FWHM}}; TOF ≈10{,}000; FTICR >100{,}000.

  • Isotope Spacing: \Delta (m/z)=1/z.

Ethical / Practical Implications

  • Proper statistical validation prevents publication of unreliable proteomes.

  • SILAC avoids radioactivity and enables in-vivo dynamic studies without additional chemical manipulations.

  • ICAT selects for cysteine-containing proteins; researchers must report potential bias against cysteine-free proteins.

  • Sample handling: use gloves/lab coats to minimize keratin contamination.

Connections to Foundational Principles & Other ‘Omics’

  • Complementarity with transcriptomics: protein abundance correlates poorly with mRNA (Ref 60) → proteomics indispensable.

  • Systems biology integrates MS-based proteomics, mRNA arrays, imaging for holistic cellular maps (Refs 62-67).

Future Directions

  • Higher MS speed & resolution promise near-complete proteome coverage (accurate-mass-tag approaches).

  • Improved statistical tools (machine learning) & community standards for FDR reporting.

  • Clinical translation: SELDI patterns under scrutiny; need robust bioinformatics for biomarker validation.