Proteomics and Mass Spectrometry – Lecture 1 (TAIBMS module)

Human Genome & Human Proteome

Genome = complete genetic material of an organism
- Humans: nuclear genome (23 pairs of chromosomes) + mitochondrial genome
- Contains both protein–coding genes and non-coding DNA
- Landmark projects
- Human Genome Project (2001)
- Human Genome Project–Write (2016)
Proteome = entire set of proteins that is or can be expressed by a cell, tissue or organism
- Dynamic, varies with cell type, developmental stage, external stimuli, disease status
- Human Proteome Atlas resources
- Organ & tissue proteomes
- Cell-cycle–dependent proteome
- Organelle proteome
Ongoing debate on gene/ protein numbers
- Latest (Nature 2018) tally ≈ 21\,306 protein-coding genes
- Historical estimates ranged from \approx6\,700 to \approx100\,000

Why Study Proteins? — The ’Omics Cascade

Flow of biological information
- DNA → RNA → Protein → Metabolite
Each level has its own discipline/ technology
- Genome → Next-Generation Sequencing (NGS)
- Transcriptome → RNA-Seq
- Proteome → Mass Spectrometry-based proteomics
- Metabolome → Metabolomics
Proteome adds post-transcriptional & post-translational complexity beyond the genome
- Isoforms, splice variants, PTMs, turnover, cellular localisation, interactions, temporal regulation

Proteome Complexity

One gene ≠ one protein; multiple mechanisms increase proteome size relative to genome size
- Alternative splicing
- RNA editing
- Post-translational modifications (PTMs): phosphorylation, glycosylation, ubiquitination, etc.
Illustration (Virág et al., 2020): complexity massively increases from genome → transcriptome → proteome

Information Gleaned from Proteomics

Protein isoform identification
Differential gene expression at the protein level
Spatiotemporal dynamics (where & when proteins appear)
PTM mapping & PTM dynamics
Protein–protein / protein–nucleic-acid / protein–metabolite interactions
Cellular localisation (sub-organellar resolution)
Protein stability & turnover rates
Insight into regulation beyond transcription (translational & post-translational control)
Health vs disease “snapshot” of the protein environment

Proteomics Techniques – Overview

Two-Dimensional Gel Electrophoresis (2D-GE)
Two-Dimensional Difference Gel Electrophoresis (2D-DIGE)
Mass Spectrometry (MS)

Two-Dimensional Gel Electrophoresis (2D-GE)

1st dimension: Isoelectric Focusing (IEF)
- Proteins migrate in a pH gradient until net charge = 0 (their isoelectric point, pI)
2nd dimension: SDS-PAGE (orthogonal separation)
- Proteins separated by molecular weight (MW)
Visualisation: stain (e.g.
Coomassie, SYPRO Ruby)
Each spot ≈ individual polypeptide chain (ex: entire E. coli proteome shown in classic figure)
Advantages
- Resolves hundreds → thousands of proteins in one gel ➔ discovery/ comparative profiling
- Can estimate relative abundance by spot intensity
Disadvantages
- Spot matching across gels is difficult; small positional shifts hinder quantification
- Low-abundance, very acidic/basic, very hydrophobic or extremely large/small proteins may be under-represented
- Extensive computer-based image analysis required
Biomedical example
- Ott et al. (2001) used parallel 2-D-PAGE of paired colorectal tissue → patient-specific tumour profiling
Classroom exercise (illustrated)
- Given MW & pI, locate spots A–G; reinforces concept of bi-dimensional separation

Two-Dimensional Difference Gel Electrophoresis (2D-DIGE)

Principle: direct fluorescent labelling of proteins before IEF
- Cy2, Cy3, Cy5 dyes are mass/charge-matched but have distinct excitation/ emission spectra
- Each sample (≥2) labelled with different dye; pooled “internal standard” (Cy2) added to every gel
All labelled proteins co-migrate on one gel → eliminates gel-to-gel variability
Workflow
1. Label extracts with CyDye DIGE Fluor minimal dyes
2. Mix labelled samples + Cy2-labelled normalisation pool
3. Perform 2-D separation
4. Scan gel at dye-specific wavelengths
5. Use software for image analysis, spot matching, statistics
Example: Alasmari et al. (2021)
- Compared serum proteomes of cannabis users vs controls
- >121 differentially expressed proteins (fold-change > 1.5, p<0.05) → identified by MS
Advantages
- Internal Cy2 standard → rigorous normalisation
- Multiplexing (≤3 samples per gel) reduces total gels and matching workload
- Improved quantification accuracy; detects expression changes & PTMs
Disadvantages
- Dye-labelling may reduce detection of very low-abundance proteins
- In-gel digestion + MS still limits ultra-low abundance IDs
- Reagents & scanners are relatively expensive

Mass Spectrometry (MS) – Fundamentals

Analytical technique that measures mass-to-charge ratio (m/z) of ions in the gas phase
2002 Nobel Prize in Chemistry awarded for "soft ionisation" (MALDI & ESI) that allowed analysis of biomacromolecules

Ionisation & Charge

Organic molecules gain or lose protons (H$^+$) under controlled pH → become ions
- Acidic conditions: \text{M} + H^+ \rightarrow \text{[M+H]}^{+} (cation)
- Basic conditions: \text{M} - H^+ \rightarrow \text{[M-H]}^{-} (anion)
Charged species are manipulated by electromagnetic fields (acceleration, focusing, deflection)
- In liquid = electrophoresis; in vacuum = mass spectrometry

Calculating m/z

\text{m/z} = \frac{m+z}{z}

Example for peptide of m=1200\,\text{Da}
- z=1 → (1200+1)/1 = 1201
- z=2 → (1200+2)/2 = 601
- z=3 → (1200+3)/3 = 401
Multiple isotope peaks (e.g. ^{13}\text{C}, 1.1 % natural abundance) create a pattern spaced \approx1 Da

Fragmentation (Tandem MS / MS–MS)

Selected parent ion accelerated into inert gas (e.g. N$_2$) → collision-induced dissociation (CID)
Breaks along peptide backbone → series of b-, y- (and a-, c-, z-) ions; pattern defines sequence
Example spectrum depicts y7, y8, etc.

Peptide-Centric Strategy

Proteins are enzymatically digested (commonly trypsin cuts C-terminal to Lys (K) & Arg (R))
- Yields 7–20 residue peptides: soluble, singly/multiply charged, informative fragmentation
Peptide identification
- Compare observed m/z list or fragment pattern against database (in silico digested proteome) ⇒ “peptide mass fingerprinting” / “peptide mapping”

Signal Intensity & Quantification

Peak area or height ≈ ion count → correlates with peptide abundance (may be non-linear; matrix dependent)
Ionisation efficiency varies between molecules (matrix effects)

Core Components of an MS System

Ion source – generates ions
- Electrospray Ionisation (ESI): ions from solution; compatible with LC; handles complex mixtures
- Matrix-Assisted Laser Desorption/Ionisation (MALDI): laser pulses from crystalline matrix; simple mixtures, imaging
Mass analyser – separates ions by m/z
- Quadrupole, Ion Trap, Time-of-Flight (TOF), Orbitrap, Fourier-Transform Ion Cyclotron (FT-ICR)
Detector – counts ions at each m/z
(Optional) Collision cell – fragmentation for MS-MS
Up-front separation – Liquid Chromatography (LC) or capillary electrophoresis for complex samples (LC-MS)

Bird’s-Eye Workflow Example

Source → Vacuum interface → Mass filter → Collision cell (on/off) → Mass analyser → Detector (output spectrum)

Integrating Gels & Mass Spectrometry

Spots from 2D-GE / 2D-DIGE excised → in-gel tryptic digestion → LC-MS-MS
Database search (Mascot, SEQUEST, etc.) matches spectra → identifies protein; relative spot intensity gives abundance

Proteomics Pipeline (Summary Workflow)

Sample collection / culture
Disruption & solubilisation (lysis buffers, detergents)
Complexity reduction (fractionation, 2D-GE/DIGE, LC)
Protease digestion (bottom-up) or top-down intact MS
Peptide clean-up (desalting, SPE)
Mass-spectrometric analysis (instrument run)
Data processing (peak picking, identification, quantification)
Biological interpretation (pathways, biomarkers, personalised medicine)

Current & Emerging Applications

Clinical biomarker discovery (blood, urine, tears)
Disease pathogenesis & personalised medicine
Host–pathogen interaction mapping
Drug-target identification & mode-of-action studies
Neuroproteomics; prediction of clinical outcomes
Competing/ complementary technologies: deep RNA-Seq, single-molecule proteomics, “next-generation proteomics”

Learning Outcomes (Revisited)

Define proteome and contrast with genome / transcriptome
Explain biomedical importance of proteomics
Describe core experimental approaches: 2D-GE, 2D-DIGE, Mass Spectrometry
Outline fundamental MS principles: ionisation, m/z determination, fragmentation, instrumentation

Key Numerical / Statistical Facts & Equations

\text{Current human protein-coding genes} \approx 21\,306
\text{m/z} = (m+z)/z ; 1\,\text{proton} \equiv 1\,\text{Da of mass and +1 charge}
Differential expression threshold example: fold-change > 1.5, p < 0.05 (Alasmari 2021)

Ethical / Practical Considerations

Data volume & privacy in personalised proteomics
Sample handling & standardisation critical; PTMs may change post-collection
Cost–benefit of high-end MS instruments vs clinical utility

References & Further Reading (selected from lecture)

Alberts B. “Molecular Biology of the Cell” (E-book)
Cooper A. “Biophysical Chemistry” Chapter 7 – Electrophoresis
Virág D. et al. 2020. Current Trends in PTM Analysis (Chromatographia 83)
HUPO & EBI on-line courses (What is Proteomics?)
Biomedical examples: Hariu 2017 (MALDI-TOF in blood cultures); Della Corte 2008 (2D-DIGE platelet secretome)