Proteomics and Mass Spectrometry – Lecture 1 (TAIBMS module)
Human Genome & Human Proteome
Genome = complete genetic material of an organism
Humans: nuclear genome (23 pairs of chromosomes) + mitochondrial genome
Contains both protein–coding genes and non-coding DNA
Landmark projects
Human Genome Project (2001)
Human Genome Project–Write (2016)
Proteome = entire set of proteins that is or can be expressed by a cell, tissue or organism
Dynamic, varies with cell type, developmental stage, external stimuli, disease status
Human Proteome Atlas resources
Organ & tissue proteomes
Cell-cycle–dependent proteome
Organelle proteome
Ongoing debate on gene/ protein numbers
Latest (Nature 2018) tally ≈ 21\,306 protein-coding genes
Historical estimates ranged from \approx6\,700 to \approx100\,000
Why Study Proteins? — The ’Omics Cascade
Flow of biological information
DNA → RNA → Protein → Metabolite
Each level has its own discipline/ technology
Genome → Next-Generation Sequencing (NGS)
Transcriptome → RNA-Seq
Proteome → Mass Spectrometry-based proteomics
Metabolome → Metabolomics
Proteome adds post-transcriptional & post-translational complexity beyond the genome
Isoforms, splice variants, PTMs, turnover, cellular localisation, interactions, temporal regulation
Proteome Complexity
One gene ≠ one protein; multiple mechanisms increase proteome size relative to genome size
Alternative splicing
RNA editing
Post-translational modifications (PTMs): phosphorylation, glycosylation, ubiquitination, etc.
Illustration (Virág et al., 2020): complexity massively increases from genome → transcriptome → proteome
Information Gleaned from Proteomics
Protein isoform identification
Differential gene expression at the protein level
Spatiotemporal dynamics (where & when proteins appear)
PTM mapping & PTM dynamics
Protein–protein / protein–nucleic-acid / protein–metabolite interactions
Cellular localisation (sub-organellar resolution)
Protein stability & turnover rates
Insight into regulation beyond transcription (translational & post-translational control)
Health vs disease “snapshot” of the protein environment
Proteomics Techniques – Overview
Two-Dimensional Gel Electrophoresis (2D-GE)
Two-Dimensional Difference Gel Electrophoresis (2D-DIGE)
Mass Spectrometry (MS)
Two-Dimensional Gel Electrophoresis (2D-GE)
1st dimension: Isoelectric Focusing (IEF)
Proteins migrate in a pH gradient until net charge = 0 (their isoelectric point, pI)
2nd dimension: SDS-PAGE (orthogonal separation)
Proteins separated by molecular weight (MW)
Visualisation: stain (e.g.
Coomassie, SYPRO Ruby)Each spot ≈ individual polypeptide chain (ex: entire E. coli proteome shown in classic figure)
Advantages
Resolves hundreds → thousands of proteins in one gel ➔ discovery/ comparative profiling
Can estimate relative abundance by spot intensity
Disadvantages
Spot matching across gels is difficult; small positional shifts hinder quantification
Low-abundance, very acidic/basic, very hydrophobic or extremely large/small proteins may be under-represented
Extensive computer-based image analysis required
Biomedical example
Ott et al. (2001) used parallel 2-D-PAGE of paired colorectal tissue → patient-specific tumour profiling
Classroom exercise (illustrated)
Given MW & pI, locate spots A–G; reinforces concept of bi-dimensional separation
Two-Dimensional Difference Gel Electrophoresis (2D-DIGE)
Principle: direct fluorescent labelling of proteins before IEF
Cy2, Cy3, Cy5 dyes are mass/charge-matched but have distinct excitation/ emission spectra
Each sample (≥2) labelled with different dye; pooled “internal standard” (Cy2) added to every gel
All labelled proteins co-migrate on one gel → eliminates gel-to-gel variability
Workflow
Label extracts with CyDye DIGE Fluor minimal dyes
Mix labelled samples + Cy2-labelled normalisation pool
Perform 2-D separation
Scan gel at dye-specific wavelengths
Use software for image analysis, spot matching, statistics
Example: Alasmari et al. (2021)
Compared serum proteomes of cannabis users vs controls
>121 differentially expressed proteins (fold-change > 1.5, p<0.05) → identified by MS
Advantages
Internal Cy2 standard → rigorous normalisation
Multiplexing (≤3 samples per gel) reduces total gels and matching workload
Improved quantification accuracy; detects expression changes & PTMs
Disadvantages
Dye-labelling may reduce detection of very low-abundance proteins
In-gel digestion + MS still limits ultra-low abundance IDs
Reagents & scanners are relatively expensive
Mass Spectrometry (MS) – Fundamentals
Analytical technique that measures mass-to-charge ratio (m/z) of ions in the gas phase
2002 Nobel Prize in Chemistry awarded for "soft ionisation" (MALDI & ESI) that allowed analysis of biomacromolecules
Ionisation & Charge
Organic molecules gain or lose protons (H$^+$) under controlled pH → become ions
Acidic conditions: \text{M} + H^+ \rightarrow \text{[M+H]}^{+} (cation)
Basic conditions: \text{M} - H^+ \rightarrow \text{[M-H]}^{-} (anion)
Charged species are manipulated by electromagnetic fields (acceleration, focusing, deflection)
In liquid = electrophoresis; in vacuum = mass spectrometry
Calculating m/z
\text{m/z} = \frac{m+z}{z}
Example for peptide of m=1200\,\text{Da}
z=1 → (1200+1)/1 = 1201
z=2 → (1200+2)/2 = 601
z=3 → (1200+3)/3 = 401
Multiple isotope peaks (e.g. ^{13}\text{C}, 1.1 % natural abundance) create a pattern spaced \approx1 Da
Fragmentation (Tandem MS / MS–MS)
Selected parent ion accelerated into inert gas (e.g. N$_2$) → collision-induced dissociation (CID)
Breaks along peptide backbone → series of b-, y- (and a-, c-, z-) ions; pattern defines sequence
Example spectrum depicts y7, y8, etc.
Peptide-Centric Strategy
Proteins are enzymatically digested (commonly trypsin cuts C-terminal to Lys (K) & Arg (R))
Yields 7–20 residue peptides: soluble, singly/multiply charged, informative fragmentation
Peptide identification
Compare observed m/z list or fragment pattern against database (in silico digested proteome) ⇒ “peptide mass fingerprinting” / “peptide mapping”
Signal Intensity & Quantification
Peak area or height ≈ ion count → correlates with peptide abundance (may be non-linear; matrix dependent)
Ionisation efficiency varies between molecules (matrix effects)
Core Components of an MS System
Ion source – generates ions
Electrospray Ionisation (ESI): ions from solution; compatible with LC; handles complex mixtures
Matrix-Assisted Laser Desorption/Ionisation (MALDI): laser pulses from crystalline matrix; simple mixtures, imaging
Mass analyser – separates ions by m/z
Quadrupole, Ion Trap, Time-of-Flight (TOF), Orbitrap, Fourier-Transform Ion Cyclotron (FT-ICR)
Detector – counts ions at each m/z
(Optional) Collision cell – fragmentation for MS-MS
Up-front separation – Liquid Chromatography (LC) or capillary electrophoresis for complex samples (LC-MS)
Bird’s-Eye Workflow Example
Source → Vacuum interface → Mass filter → Collision cell (on/off) → Mass analyser → Detector (output spectrum)
Integrating Gels & Mass Spectrometry
Spots from 2D-GE / 2D-DIGE excised → in-gel tryptic digestion → LC-MS-MS
Database search (Mascot, SEQUEST, etc.) matches spectra → identifies protein; relative spot intensity gives abundance
Proteomics Pipeline (Summary Workflow)
Sample collection / culture
Disruption & solubilisation (lysis buffers, detergents)
Complexity reduction (fractionation, 2D-GE/DIGE, LC)
Protease digestion (bottom-up) or top-down intact MS
Peptide clean-up (desalting, SPE)
Mass-spectrometric analysis (instrument run)
Data processing (peak picking, identification, quantification)
Biological interpretation (pathways, biomarkers, personalised medicine)
Current & Emerging Applications
Clinical biomarker discovery (blood, urine, tears)
Disease pathogenesis & personalised medicine
Host–pathogen interaction mapping
Drug-target identification & mode-of-action studies
Neuroproteomics; prediction of clinical outcomes
Competing/ complementary technologies: deep RNA-Seq, single-molecule proteomics, “next-generation proteomics”
Learning Outcomes (Revisited)
Define proteome and contrast with genome / transcriptome
Explain biomedical importance of proteomics
Describe core experimental approaches: 2D-GE, 2D-DIGE, Mass Spectrometry
Outline fundamental MS principles: ionisation, m/z determination, fragmentation, instrumentation
Key Numerical / Statistical Facts & Equations
\text{Current human protein-coding genes} \approx 21\,306
\text{m/z} = (m+z)/z ; 1\,\text{proton} \equiv 1\,\text{Da of mass and +1 charge}
Differential expression threshold example: fold-change > 1.5, p < 0.05 (Alasmari 2021)
Ethical / Practical Considerations
Data volume & privacy in personalised proteomics
Sample handling & standardisation critical; PTMs may change post-collection
Cost–benefit of high-end MS instruments vs clinical utility
References & Further Reading (selected from lecture)
Alberts B. “Molecular Biology of the Cell” (E-book)
Cooper A. “Biophysical Chemistry” Chapter 7 – Electrophoresis
Virág D. et al. 2020. Current Trends in PTM Analysis (Chromatographia 83)
HUPO & EBI on-line courses (What is Proteomics?)
Biomedical examples: Hariu 2017 (MALDI-TOF in blood cultures); Della Corte 2008 (2D-DIGE platelet secretome)