KS

Proteomics and Mass Spectrometry – Lecture 1 (TAIBMS module)

Human Genome & Human Proteome

  • Genome = complete genetic material of an organism

    • Humans: nuclear genome (23 pairs of chromosomes) + mitochondrial genome

    • Contains both protein–coding genes and non-coding DNA

    • Landmark projects

    • Human Genome Project (2001)

    • Human Genome Project–Write (2016)

  • Proteome = entire set of proteins that is or can be expressed by a cell, tissue or organism

    • Dynamic, varies with cell type, developmental stage, external stimuli, disease status

    • Human Proteome Atlas resources

    • Organ & tissue proteomes

    • Cell-cycle–dependent proteome

    • Organelle proteome

  • Ongoing debate on gene/ protein numbers

    • Latest (Nature 2018) tally ≈ 21\,306 protein-coding genes

    • Historical estimates ranged from \approx6\,700 to \approx100\,000

Why Study Proteins? — The ’Omics Cascade

  • Flow of biological information

    • DNA → RNA → Protein → Metabolite

  • Each level has its own discipline/ technology

    • Genome → Next-Generation Sequencing (NGS)

    • Transcriptome → RNA-Seq

    • Proteome → Mass Spectrometry-based proteomics

    • Metabolome → Metabolomics

  • Proteome adds post-transcriptional & post-translational complexity beyond the genome

    • Isoforms, splice variants, PTMs, turnover, cellular localisation, interactions, temporal regulation

Proteome Complexity

  • One gene ≠ one protein; multiple mechanisms increase proteome size relative to genome size

    • Alternative splicing

    • RNA editing

    • Post-translational modifications (PTMs): phosphorylation, glycosylation, ubiquitination, etc.

  • Illustration (Virág et al., 2020): complexity massively increases from genome → transcriptome → proteome

Information Gleaned from Proteomics

  • Protein isoform identification

  • Differential gene expression at the protein level

  • Spatiotemporal dynamics (where & when proteins appear)

  • PTM mapping & PTM dynamics

  • Protein–protein / protein–nucleic-acid / protein–metabolite interactions

  • Cellular localisation (sub-organellar resolution)

  • Protein stability & turnover rates

  • Insight into regulation beyond transcription (translational & post-translational control)

  • Health vs disease “snapshot” of the protein environment

Proteomics Techniques – Overview

  • Two-Dimensional Gel Electrophoresis (2D-GE)

  • Two-Dimensional Difference Gel Electrophoresis (2D-DIGE)

  • Mass Spectrometry (MS)

Two-Dimensional Gel Electrophoresis (2D-GE)

  • 1st dimension: Isoelectric Focusing (IEF)

    • Proteins migrate in a pH gradient until net charge = 0 (their isoelectric point, pI)

  • 2nd dimension: SDS-PAGE (orthogonal separation)

    • Proteins separated by molecular weight (MW)

  • Visualisation: stain (e.g.
    Coomassie, SYPRO Ruby)

  • Each spot ≈ individual polypeptide chain (ex: entire E. coli proteome shown in classic figure)

  • Advantages

    • Resolves hundreds → thousands of proteins in one gel ➔ discovery/ comparative profiling

    • Can estimate relative abundance by spot intensity

  • Disadvantages

    • Spot matching across gels is difficult; small positional shifts hinder quantification

    • Low-abundance, very acidic/basic, very hydrophobic or extremely large/small proteins may be under-represented

    • Extensive computer-based image analysis required

  • Biomedical example

    • Ott et al. (2001) used parallel 2-D-PAGE of paired colorectal tissue → patient-specific tumour profiling

  • Classroom exercise (illustrated)

    • Given MW & pI, locate spots A–G; reinforces concept of bi-dimensional separation

Two-Dimensional Difference Gel Electrophoresis (2D-DIGE)

  • Principle: direct fluorescent labelling of proteins before IEF

    • Cy2, Cy3, Cy5 dyes are mass/charge-matched but have distinct excitation/ emission spectra

    • Each sample (≥2) labelled with different dye; pooled “internal standard” (Cy2) added to every gel

  • All labelled proteins co-migrate on one gel → eliminates gel-to-gel variability

  • Workflow

    1. Label extracts with CyDye DIGE Fluor minimal dyes

    2. Mix labelled samples + Cy2-labelled normalisation pool

    3. Perform 2-D separation

    4. Scan gel at dye-specific wavelengths

    5. Use software for image analysis, spot matching, statistics

  • Example: Alasmari et al. (2021)

    • Compared serum proteomes of cannabis users vs controls

    • >121 differentially expressed proteins (fold-change > 1.5, p<0.05) → identified by MS

  • Advantages

    • Internal Cy2 standard → rigorous normalisation

    • Multiplexing (≤3 samples per gel) reduces total gels and matching workload

    • Improved quantification accuracy; detects expression changes & PTMs

  • Disadvantages

    • Dye-labelling may reduce detection of very low-abundance proteins

    • In-gel digestion + MS still limits ultra-low abundance IDs

    • Reagents & scanners are relatively expensive

Mass Spectrometry (MS) – Fundamentals

  • Analytical technique that measures mass-to-charge ratio (m/z) of ions in the gas phase

  • 2002 Nobel Prize in Chemistry awarded for "soft ionisation" (MALDI & ESI) that allowed analysis of biomacromolecules

Ionisation & Charge

  • Organic molecules gain or lose protons (H$^+$) under controlled pH → become ions

    • Acidic conditions: \text{M} + H^+ \rightarrow \text{[M+H]}^{+} (cation)

    • Basic conditions: \text{M} - H^+ \rightarrow \text{[M-H]}^{-} (anion)

  • Charged species are manipulated by electromagnetic fields (acceleration, focusing, deflection)

    • In liquid = electrophoresis; in vacuum = mass spectrometry

Calculating m/z

\text{m/z} = \frac{m+z}{z}

  • Example for peptide of m=1200\,\text{Da}

    • z=1 → (1200+1)/1 = 1201

    • z=2 → (1200+2)/2 = 601

    • z=3 → (1200+3)/3 = 401

  • Multiple isotope peaks (e.g. ^{13}\text{C}, 1.1 % natural abundance) create a pattern spaced \approx1 Da

Fragmentation (Tandem MS / MS–MS)

  • Selected parent ion accelerated into inert gas (e.g. N$_2$) → collision-induced dissociation (CID)

  • Breaks along peptide backbone → series of b-, y- (and a-, c-, z-) ions; pattern defines sequence

  • Example spectrum depicts y7, y8, etc.

Peptide-Centric Strategy

  • Proteins are enzymatically digested (commonly trypsin cuts C-terminal to Lys (K) & Arg (R))

    • Yields 7–20 residue peptides: soluble, singly/multiply charged, informative fragmentation

  • Peptide identification

    • Compare observed m/z list or fragment pattern against database (in silico digested proteome) ⇒ “peptide mass fingerprinting” / “peptide mapping”

Signal Intensity & Quantification

  • Peak area or height ≈ ion count → correlates with peptide abundance (may be non-linear; matrix dependent)

  • Ionisation efficiency varies between molecules (matrix effects)

Core Components of an MS System

  1. Ion source – generates ions

    • Electrospray Ionisation (ESI): ions from solution; compatible with LC; handles complex mixtures

    • Matrix-Assisted Laser Desorption/Ionisation (MALDI): laser pulses from crystalline matrix; simple mixtures, imaging

  2. Mass analyser – separates ions by m/z

    • Quadrupole, Ion Trap, Time-of-Flight (TOF), Orbitrap, Fourier-Transform Ion Cyclotron (FT-ICR)

  3. Detector – counts ions at each m/z

  4. (Optional) Collision cell – fragmentation for MS-MS

  5. Up-front separation – Liquid Chromatography (LC) or capillary electrophoresis for complex samples (LC-MS)

Bird’s-Eye Workflow Example

  • Source → Vacuum interface → Mass filter → Collision cell (on/off) → Mass analyser → Detector (output spectrum)

Integrating Gels & Mass Spectrometry

  • Spots from 2D-GE / 2D-DIGE excised → in-gel tryptic digestion → LC-MS-MS

  • Database search (Mascot, SEQUEST, etc.) matches spectra → identifies protein; relative spot intensity gives abundance

Proteomics Pipeline (Summary Workflow)

  1. Sample collection / culture

  2. Disruption & solubilisation (lysis buffers, detergents)

  3. Complexity reduction (fractionation, 2D-GE/DIGE, LC)

  4. Protease digestion (bottom-up) or top-down intact MS

  5. Peptide clean-up (desalting, SPE)

  6. Mass-spectrometric analysis (instrument run)

  7. Data processing (peak picking, identification, quantification)

  8. Biological interpretation (pathways, biomarkers, personalised medicine)

Current & Emerging Applications

  • Clinical biomarker discovery (blood, urine, tears)

  • Disease pathogenesis & personalised medicine

  • Host–pathogen interaction mapping

  • Drug-target identification & mode-of-action studies

  • Neuroproteomics; prediction of clinical outcomes

  • Competing/ complementary technologies: deep RNA-Seq, single-molecule proteomics, “next-generation proteomics”

Learning Outcomes (Revisited)

  • Define proteome and contrast with genome / transcriptome

  • Explain biomedical importance of proteomics

  • Describe core experimental approaches: 2D-GE, 2D-DIGE, Mass Spectrometry

  • Outline fundamental MS principles: ionisation, m/z determination, fragmentation, instrumentation

Key Numerical / Statistical Facts & Equations

  • \text{Current human protein-coding genes} \approx 21\,306

  • \text{m/z} = (m+z)/z ; 1\,\text{proton} \equiv 1\,\text{Da of mass and +1 charge}

  • Differential expression threshold example: fold-change > 1.5, p < 0.05 (Alasmari 2021)

Ethical / Practical Considerations

  • Data volume & privacy in personalised proteomics

  • Sample handling & standardisation critical; PTMs may change post-collection

  • Cost–benefit of high-end MS instruments vs clinical utility

References & Further Reading (selected from lecture)

  • Alberts B. “Molecular Biology of the Cell” (E-book)

  • Cooper A. “Biophysical Chemistry” Chapter 7 – Electrophoresis

  • Virág D. et al. 2020. Current Trends in PTM Analysis (Chromatographia 83)

  • HUPO & EBI on-line courses (What is Proteomics?)

  • Biomedical examples: Hariu 2017 (MALDI-TOF in blood cultures); Della Corte 2008 (2D-DIGE platelet secretome)