Proteomics, MS and Biomedical Applications – Vocabulary

Lecture Context

Course/Unit: TAiBMS 6H5Z1036_2425
Lecture 2 title: Proteomics, MS and Biomedical Applications: Quantification and Experimental Strategies
Lecturers/Contributors: Dr Jon Humphries (presenter), Prof. Zoltan Takats (host institution link provided)
Format: Slide-based lecture, page numbers 1–37

Stated Learning Outcomes

Describe a typical MS-proteomics workflow
Understand quantitative approaches employed in proteomics
Recognise research & biomedical applications of quantitative MS proteomics

Proteomics ‑ Recap & Core Concepts

Proteomics definition: Large-scale, global study of proteins (analogous to genomics for DNA)
Unique difficulty: Proteins cannot be amplified like DNA → sensitivity, dynamic-range issues
First use of “proteome”: 1997
Proteome = complete set of proteins produced or post-translationally modified by a cell, tissue, organism or system
Field is acronym-heavy: SILAC, iTRAQ, TMT, ESI, MALDI, TOF, SRM, MS1, MS2 etc.
Proteomics aims to identify AND quantify system components

Generic MS-Proteomics Workflow

Inter-dependent steps (all critical):
• Sample collection / lysis / enrichment
• Enzymatic digestion (classically trypsin)
• Separation (LC or other chromatographies)
• Ionisation (ESI for LC, MALDI for spot-based)
• Mass spectrometry acquisition (MS1 survey → MS2 fragmentation)
• Bioinformatics & statistics (database search, FDR, volcano plots, PCA, clustering, network analysis)
Experiment does not stop at the peptide list → downstream biological interpretation essential

Instrument Illustration (LC-MS/MS)

Bench-top dual-module: LC left, MS right (Cravatt et al., 2007)
MS measures m/z (mass-to-charge ratio) of ionised peptides

Peptide Sequencing Basics (abc/xyz Ions)

Peptide chosen in MS1 is fragmented in MS2
Backbone breaks primarily at peptide bonds → a, b, c (N-terminal) & x, y, z (C-terminal) ion series
Resulting spectrum reflects all possible break points; identification is probability-based DB matching

In-class Exercise – Trypsin Rule

Trypsin cleaves C-terminal to Lys (K) or Arg (R) unless followed by Pro
Sequence example (positions 1–37): QPQPAQNVLA APRGLGAAEF GGKAGNVEAP GETFAQ
Expected theoretical peptides: 3 (confirmed via Expasy Peptide Cutter)

Quantitative Proteomics – Strategic Decisions

Relative vs Absolute quantification
• Most studies are relative (fold-change)
• Absolute (concentration, e.g.
\text{pmol}/\mu\text{L}) demands external calibration / validation
Label vs Label-Free
Targeted vs Discovery

Label-Based vs Label-Free (LFQ)

Label Approaches (heavy isotopes or isobaric tags)
• Additional cost (15N/13C amino acids, TMT/iTRAQ reagents)
• Experimental constraints: cell culture easier than whole-animal
• Multiplexing lowers MS run count & reduces prep-derived variability
Label-Free
• Cheaper design, no chemical handling, suits any sample
• Requires more LC-MS runs; quant accuracy relies on ion intensity (XIC) or spectral counting (the latter less accurate)

Two Major Labelling Families

SILAC – Stable Isotope Labelling by Amino acids in Cell Culture
• Metabolic → labelled proteins before extraction
• Quantification from MS1 peak intensities
TMT / iTRAQ – Tandem Mass Tag / Isobaric Tags
• Chemical tagging of peptides post-digestion
• Quantification from MS2 reporter ions
• Up to 16-plex nowadays
Rule of thumb: equal protein levels give 1:1 heavy/light or tag reporter ratios

Targeted vs Discovery Workflows

Discovery (shotgun/DDA)
• Goal: max coverage
• Acquisition: precursors selected data-dependently by intensity
Targeted (SRM/MRM, PRM, DIA)
• Focus: predefined peptides → high sensitivity & quantitative precision
• Classical hardware: triple quadrupole (QQQ)
• Process:
– Q1 isolates precursor m/z
– Collision cell fragments
– Q3 monitors selected product ions
• Absolute amounts via heavy synthetic standards
• Practical multiplex: 50–100 proteins per run
DIA / SWATH-MS: Hybrid targeted-like quant without SRM optimisation; all precursors fragmented in sequential m/z windows; identification via spectral libraries

Fundamental Compromise (Targeted vs Discovery)

Trade-off triangle:
• Proteome breadth
• Detection sensitivity
• Assay scalability
Decide on absolute or relative needs before committing to workflow

Biomedical & Research Applications

Cell-ECM Adhesion & Integrins

Integrins = heterodimeric receptors linking cytoskeleton ↔ ECM
Control: mechano-signalling, migration, survival, proliferation, differentiation

ECM Production In Vitro (Rashid et al., 2012; Byron et al., 2014)

MS used to catalogue & quantify secreted ECM under cell-culture cross-talk
Workflows: ECM enrichment → LC-MS/MS → statistics (volcano plots) & protein-protein interaction (PPI) networks

Protein–Protein Interaction Mapping

GFP-TRAP IPs (Jacquemet et al., 2013): isolate GFP-tagged small GTPase complexes; output analysed with clustering, heat-maps, network reconstructions
Integrin ligand pull-downs (Humphries et al., 2009; Jones et al., 2015): affinity purification vs fibronectin/VCAM ligands → modelling α5β1 and α4β1 adhesome networks

Post-Translational Modifications

Phospho-adhesome (Robertson et al., 2015)
• Enrichment of phosphopeptides from adhesion complexes
• Revealed far more phosphoproteins than prior estimates
• Data mined through ontologies & PPI networks

Spatial Proteomics (Proximity Labelling)

BioID (Roux et al., 2012; Lundberg & Börner 2019)
• Mutant BirA* biotin-ligase fused to bait → labels proteins within \sim10\,\text{nm}
• Advantages: in situ, no need to keep interactions intact, reveals nano-topology
• Disadvantage: genetic fusion/expression needed
• Alternative enzymes: APEX peroxidase, TurboID etc.
BioID-generated adhesome (Chastney et al., 2020)
• 16 bait proteins; LFQ via MaxQuant + SAINT
• Identified 146 enriched proteins → 360 proximity edges, 81\% previously unreported (BioGRID)
• Combined hierarchical clustering with network topology

Cancer Diagnostics & Therapeutics

Tumour micro-environment (Carr & Fernandez-Zapico, 2016): stroma, fibroblasts, immune cells yield multiple biomarker sources (plasma, biopsy, liquid biopsy, histology)
Need markers for entire patient journey: predisposition → early detection → personalised therapy

iKnife (REIMS Technology)

Surgical diathermy coupled to rapid-evaporative ionisation MS
Classifier trained on tumour vs normal tissue “fingerprints”
Advantage: real-time guidance during resection; note does not measure proteins per se

MS Imaging

Discovery mode molecular imaging; comparatively low spatial resolution
Produces ion maps without explicit biomolecule ID (requires orthogonal validation)

Clinical Proteomics Workflow Snapshot (Zhu et al., 2021)

Workflow encompasses:
1. Sample selection (tissue, fluid, FFPE, cell culture)
2. Protein extraction & clean-up
3. Separation (SDS-PAGE, SEC, OFFGEL, LC)
4. Digestion (trypsin, Lys-C, etc.)
5. Optional labelling/enrichment steps (SILAC, TMT, PTM enrichment)
6. LC runtime (nanoLC/UHPLC)
7. MS acquisition (Orbitrap, Q-Exactive, TOF, QQQ)
8. Identification & Quantification (search engines, FDR)
9. Bioinformatics (stat tests, pathway, network, machine learning)

Strengths & Weaknesses of Quantitative Workflows (General)

Label-based: high precision, multiplex, lower run count; but costly & sample-mixing complexity
Label-free: universal applicability, cost-effective; but run-to-run variability & larger instrument time
Targeted: exquisite sensitivity & absolute-quant option; but limited breadth & assay development overhead (unless DIA)
Discovery: global view & hypothesis generation; but semi-quantitative & under-samples low-abundance proteins

Key Numeric / Technical References

Typical SRM panel size: 50\text{–}100 proteins
BioID labelling radius: \sim10\,\text{nm}
BioID adhesome: 146 enriched proteins; 360 edges; 81\% novel
TMT plexing currently up to 16

Ethical, Philosophical & Practical Implications

Clinical translation requires balancing experimental rigour with cost, throughput, and regulatory demands
Quantification strategy influences data reproducibility and biological interpretability
Patient benefit (e.g. iKnife) hinges on robust training datasets & ongoing validation

Essential & Recommended Reading (as per slide 37)

Essential:
• Steen & Mann (2004) “The abc's (and xyz's) of peptide sequencing”
• Zhu et al. (2021) “SnapShot: Clinical proteomics”
Recommended:
• Cravatt et al. (2007) – Biological impact of MS proteomics
• Doerr (2013) – Targeted proteomics
• Lundberg & Börner (2019) – Spatial proteomics
• Samavarchi-Tehrani, Gingras (2020) – Proximity biotinylation
Additional cited open-access studies embedded throughout lecture

Recap of Learning Outcomes Achieved

Detailed breakdown of MS-proteomics workflow (sample → bioinformatics)
Exhaustive comparison of quantitative strategies (relative/absolute; label/label-free; targeted/discovery)
Multiple research & biomedical case studies: ECM/integrin biology, phospho-adhesome, BioID spatial mapping, cancer diagnostics, iKnife, imaging

Closing Prompts

Revisit any section for clarification?
Think about experimental requirements (precision, depth, cost, speed) when designing your own proteomics study.