Proteomics Lecture Notes
Introduction to Proteomics
Proteomics refers to the comprehensive study of proteomes, the entire set of proteins expressed in a biological organism under specific conditions.
This comprehensive study includes not only identifying all proteins but also understanding their quantities, structures, post-translational modifications, interactions, and localization. It aims to provide a snapshot of protein activity at a given time, reflecting cellular processes and responses to stimuli.
Key Concepts in Molecular Biology
The Central Dogma
**DNA
This describes the fundamental flow of genetic information in biological systems, from the stable storage in DNA to its functional expression as proteins. Processes involved in this flow include:
Transcription: The enzymatic process of copying a segment of DNA (a gene) into an RNA molecule, specifically messenger RNA (mRNA). This process is catalyzed by RNA polymerase, which unwinds a portion of the DNA double helix and synthesizes a complementary RNA strand using one of the DNA strands as a template.
Translation: The intricate process where ribosomes in the cytoplasm synthesize proteins by decoding the mRNA sequence. Each three-nucleotide codon in the mRNA corresponds to a specific amino acid, which is brought to the ribosome by transfer RNA (tRNA) molecules. The ribosome then links these amino acids together to form a polypeptide chain.
Measurement Techniques in Molecular Biology
Analysis of Gene Expression
Methods to measure/detect mRNA levels:
Northern Blot: A traditional molecular biology technique used to study gene expression by detecting the presence and quantifying the amount of specific RNA sequences (usually mRNA) in a sample. RNA is separated by gel electrophoresis, transferred to a membrane, and then probed with a labeled complementary DNA or RNA sequence.
RT-qPCR: Reverse Transcription quantitative Polymerase Chain Reaction. This highly sensitive method first reverse transcribes mRNA into complementary DNA (cDNA) and then quantifies specific mRNA levels by amplifying the cDNA using PCR, measuring the accumulation of fluorescent signal in real-time. It's often used for precise quantification of gene expression changes.
In Situ Hybridization (FISH, RNA-ISH): A technique that localizes specific nucleic acid targets (DNA or RNA) within fixed tissues, cells, or on chromosomes. Labeled probes that are complementary to the target sequence bind to it, allowing its visualization under a microscope, providing spatial and temporal expression information.
Microarray: A high-throughput technique that allows researchers to simultaneously measure the expression levels of thousands of genes (or even entire transcriptomes) in a single experiment. mRNA from a sample is reverse transcribed, labeled, and hybridized to a chip containing thousands of immobilized DNA probes, with signal intensity reflecting gene expression.
RNA Sequencing (RNA-Seq): A powerful next-generation sequencing technique that provides insight into the complete RNA population (the transcriptome) of a sample. It involves converting RNA to cDNA, sequencing the cDNA fragments, and then mapping these reads back to a reference genome to quantify gene expression, identify novel transcripts, and detect RNA variants.
Genome Scale Measurement
Measures overall transcriptional activity from gene promoters:
Reporter Assays: Molecular biology techniques used to assess the activity of a gene promoter, enhancer, or other regulatory sequences. A gene of interest (the reporter gene) whose expression is easily detectable (e.g., luciferase, GFP) is placed under the control of the regulatory sequence. The level of reporter protein expression then indicates the transcriptional activity of the regulatory sequence.
Protein Measurement
Techniques to measure/detect protein levels:
Western Blot: A widely used analytical technique in molecular biology and immunogenetics to detect specific proteins in a complex sample (e.g., cell lysate or tissue homogenate). Proteins are separated by size using SDS-PAGE, transferred to a membrane, and then visualized using specific antibodies that bind to the target protein.
Immunofluorescence (IF): A microscopy technique that uses fluorescently labeled antibodies to detect and visualize specific proteins or antigens in fixed cells or tissue sections. It provides information about protein localization, abundance, and interactions within the cellular context.
Proteomic Methods: A broad category of advanced techniques, primarily relying on Mass Spectrometry, for analyzing the entire set of proteins (the proteome) in a sample. These methods allow for high-throughput identification, quantification, and characterization of proteins, including their post-translational modifications and interactions.
Smaller scale techniques may focus on specific proteins (e.g., Western Blot for one protein) while larger scale approaches (e.g., Mass Spectrometry-based proteomics) consider the entire proteome to gain a holistic view of protein dynamics.
The -omes Defined
Overview of Biological Macromolecules
Genome: The complete set of DNA sequences contained within a cell or organism, often referring primarily to the chromosomal DNA, but also including mitochondrial or chloroplast DNA. It is the organism's hereditary information.
Transcriptome: The complete set of RNA molecules, including mRNA, rRNA, tRNA, and non-coding RNAs, present in a cell, organism, or specific tissue type under particular conditions. Typically, when discussing gene expression, it refers most commonly to messenger RNAs (mRNAs), which are transcribed from protein-coding genes.
Proteome: The entire set of proteins expressed by a cell, tissue, organism, or biological system at a given time or under specific conditions. Unlike the genome, the proteome is highly dynamic and can vary greatly depending on cellular state, developmental stage, and environmental factors.
Dynamics of Biomolecules
In biological systems (organelles, cells, tissues):
Genome: Generally considered a static representation of an organism's genetic blueprint, remaining largely constant across most cells within an individual, though somatic mutations can occur.
Transcriptome and Proteome: These are highly dynamic entities, characterized by rapid and significant variation in response to environmental stimuli, physiological changes, developmental cues, and disease states. Their dynamic nature reflects the cell's active adaptation.
Individual cell types, even within the same organism, can exhibit vastly different transcriptomes and proteomes due to differential gene expression and protein modification, which can further change dramatically over time to perform specialized functions or respond to signals.
Understanding the Proteome
Complexity of the Proteome
The proteome is inherently more complex than the genome or transcriptome due to various factors that multiply the potential forms and functions of proteins from a relatively fixed set of genes:
Post-translational modifications (PTMs): After translation, proteins can undergo numerous covalent modifications (e.g., phosphorylation, glycosylation, acetylation) on specific amino acid residues. These PTMs significantly alter protein structure, activity, stability, localization, and interaction partners, leading to a much larger functional proteome than predicted solely from gene number.
Rate of protein degradation: Proteins are constantly being synthesized and degraded. The half-life of proteins varies widely, influencing their steady-state levels. Differential degradation rates contribute to the dynamic composition of the proteome and its responsiveness.
Protein localization within the cell: The specific cellular compartment where a protein resides (e.g., nucleus, mitochondria, plasma membrane) dictates its function and interaction partners. A single protein can exist in different locations with distinct roles, further increasing proteomic complexity.
Alternative splicing: A single gene can often give rise to multiple mRNA isoforms through alternative splicing, each of which can be translated into a distinct protein isoform with potentially different functions.
Characteristics of Proteins in Cells
Typical mammalian cells contain approximately 10 billion copies of proteins, showcasing the immense molecular machinery required for cellular processes.
Most proteins are expressed at relatively low levels (10^1–10^2 copies/cell), indicating their precise regulatory roles. However, some essential housekeeping proteins can be highly abundant (10^4–10^6 copies/cell), reflecting their ubiquitous and critical functions (e.g., actin, tubulin). This wide dynamic range in protein abundance poses significant challenges for comprehensive proteomic analysis.
Protein Localization and Function
Proteins synthesized in the cytosol may subsequently be targeted to, or assume various roles in, different cellular compartments to fulfill diverse cellular functions:
Nucleus: Proteins here are crucial for synthesizing and processing DNA (replication, repair) and RNA (transcription, splicing, rRNA synthesis), involving enzymes like DNA polymerase, RNA polymerase, and histone proteins.
Mitochondria: These organelles house enzymes related to ATP production through cellular respiration, including components of the electron transport chain (e.g., cytochrome c oxidase) and enzymes of the Krebs cycle.
Peroxisomes: Contain enzymes responsible for degrading fatty acids and other organic compounds, producing hydrogen peroxide as a byproduct, and then breaking it down (e.g., catalase).
Endoplasmic Reticulum (ER): A network involved in the synthesis, folding, modification, and transport of proteins destined for secretion, insertion into membranes, or delivery to other organelles (e.g., chaperones like BiP, protein disulfide isomerase).
Plasma Membrane: Contains a diverse array of transmembrane proteins, including receptors for cell signaling, transporters for nutrient uptake and waste removal, and ion channels crucial for nerve impulses (e.g., GPCRs, Na+K+ ATPase).
Extracellular Matrix (ECM): Composed of secreted proteins and carbohydrates that provide structural support and biochemical cues to surrounding cells. It involves proteases (for ECM remodeling), hormones (signaling molecules), and structural proteins like collagen and elastin.
Post-Translational Modifications (PTMs)
PTMs are enzymatic covalent modifications to one or more amino acid residues within proteins that significantly impact their structure, stability, activity, localization, and interaction partners. They are critical for functional proteome diversity.
Phosphorylation:
Involves the reversible addition of a phosphate group () from ATP to the hydroxyl groups of specific serine (Ser), threonine (Thr), or tyrosine (Tyr) residues, catalyzed by protein kinases.
This is often a critical regulatory mechanism in cellular processes, playing key roles in cell cycle progression, cell growth, apoptosis, metabolic pathways, and especially in signal transduction cascades where it acts as a molecular switch.
Glycosylation:
The enzymatic attachment of carbohydrate side chains (glycans) to specific amino acid residues. This can be N-linked (to the amide nitrogen of asparagine residues in the sequence Asn-X-Ser/Thr) or O-linked (to the hydroxyl group of serine or threonine residues, or to hydroxylysine).
Glycosylation significantly enhances protein folding, increases protein stability, mediates critical cell-cell adhesion events (e.g., immune cell recognition), and can function as important sorting and targeting signals for proteins within the secretory pathway.
Ubiquitination:
Involves the covalent attachment of a small regulatory protein, ubiquitin (8.5 kDa), to the epsilon-amino group of lysine residues on target proteins through a cascade of E1, E2, and E3 enzymes.
Polyubiquitination (attachment of multiple ubiquitin molecules) typically serves as a degradation signal, targeting the protein to the 26S proteasome for degradation. Monoubiquitination or short ubiquitin chains can also regulate protein trafficking, DNA repair, and gene expression.
Other important covalent PTMs include:
Methylation: Addition of methyl groups, often to lysine or arginine, regulating gene expression (histone methylation) or protein function.
Acetylation: Addition of acetyl groups, commonly to lysine (e.g., histone acetylation activating transcription) or the N-terminus, affecting protein stability or interactions.
Lipidation: Covalent attachment of lipid moieties (e.g., palmitoylation, myristoylation, farnesylation) that can anchor proteins to membranes, influence protein-protein interactions, or modulate protein localization.
Mass Spectrometry in Proteomics
Overview and Applications
Mass Spectrometry (MS):
An indispensable analytical technique that measures the mass-to-charge () ratio of ions in the gas phase. It is an essential tool for comprehensive peptide and protein analysis due to its high sensitivity, accuracy, and ability to characterize complex mixtures.
Can precisely determine the molecular weight of proteins and peptides, deduce amino acid sequences, characterize various PTMs by detecting precise mass shifts, and quantify protein levels.
Key Applications:
Facilitates the identification and quantification of thousands of proteins from complex biological samples, which often contain hundreds to tens of thousands of different proteins, enabling in-depth studies of proteome composition and dynamics.
Used for biomarker discovery, drug target identification, and understanding disease mechanisms.
MS Sampling and Analysis
Preparing samples for MS usually involves complex protein mixtures, often requiring extensive fractionation steps to simplify the sample prior to mass spectrometric analysis, thereby improving detection sensitivity for less abundant proteins:
SDS-PAGE (Sodium Dodecyl Sulfate Polyacrylamide Gel Electrophoresis): A widely used technique to denature proteins (unfold them and coat them with negative charges), providing excellent separation based primarily on their molecular weight.
After separation by SDS-PAGE, protein bands or regions corresponding to specific molecular weight ranges can be cut from the gel. Proteins within each gel slice are then enzymatically digested into smaller, more manageable peptides for subsequent MS analysis.
Trypsin: The most common protease used in proteomics. It is highly specific, cleaving peptide bonds exclusively at the C-terminal side of arginine (Arg) and lysine (Lys) residues (unless followed by proline). This specificity generates peptides with predictable mass limits, making database searching more efficient.
Two-Dimensional Liquid Chromatography (2D-LC)
Process Description
Utilizes two distinct modes of liquid chromatography performed sequentially to provide orthogonal (independent) separation and significantly increase peak resolution, especially for highly complex peptide mixtures:
Ion Exchange Chromatography (1st dimension): Peptides are separated primarily based on their charge (and hydrophilicity). Peptides with different charges bind differentially to the stationary phase of the ion exchange column and are eluted using a salt gradient.
Reversed-phase Liquid Chromatography (2nd dimension): Fractions from the first dimension (ion exchange) are then separated based on their hydrophobicity. Peptides bind to a hydrophobic stationary phase and are subsequently eluted using an organic solvent gradient (e.g., acetonitrile), with more hydrophobic peptides eluting later. This two-dimensional separation greatly enhances the ability to resolve and detect a vast number of peptides.
Mass Spectrometer Components
Key Components Explained
Ion Source: The initial component responsible for converting neutral molecules from the sample into gas-phase ions. Common ionization techniques in proteomics include Electrospray Ionization (ESI) and Matrix-Assisted Laser Desorption/Ionization (MALDI), which generate charged species suitable for mass analysis.
Mass Analyzer: This is the heart of the mass spectrometer, where ions are separated based on their mass-to-charge () ratio. Different types of mass analyzers exist, such as Quadrupoles, Time-of-Flight (TOF), Orbitraps, and Fourier Transform Ion Cyclotron Resonance (FT-ICR), each offering varying levels of resolution, mass accuracy, and speed.
Ion Detector: Measures the abundance of each separated ion. When ions hit the detector, they generate an electrical signal proportional to their quantity. This signal is then processed to create a mass spectrum. Common detectors include electron multipliers and microchannel plates.
Understanding Mass Spectra
Mass Spectrum Overview
A mass spectrum is a plot that displays the relative abundance (intensity) of each detected ion versus its mass-to-charge () ratio. Each peak in the spectrum corresponds to an ion fragment from the sample. This graphical representation is highly useful for determining the presence, molecular weight, isotopic composition, and quantity of specific proteins or peptides in a sample by analyzing the distribution and intensity of the peaks.
Tandem Mass Spectrometry (MS/MS)
Process and Functionality
Tandem Mass Spectrometry (MS/MS or MS2) chains two or more mass spectrometers (or performs sequential stages of mass analysis within a single instrument) to provide a powerful tool for sequencing peptides and identifying proteins:
Selection of Precursor Ion (MS1): The first mass analyzer isolates a specific peptide ion (the "precursor ion" or "parent ion") from a mixture based on its unique mass-to-charge ratio.
Fragmentation (Collision Cell): The isolated precursor ion is then directed into a collision cell (e.g., containing an inert gas like argon or nitrogen). Here, it collides with gas molecules, gaining internal energy that leads to its fragmentation. This process, often called Collision-Induced Dissociation (CID) or Higher-Energy Collisional Dissociation (HCD), typically cleaves the peptide backbone at different peptide bonds.
Analysis of Fragments (MS2): The second mass analyzer then separates and measures the mass-to-charge ratios of these resulting fragment ions (or "product ions"). The fragmentation pattern is characteristic of the peptide's amino acid sequence. By analyzing the mass differences between adjacent fragment ions, the amino acid sequence can be deduced.
Protein Identification Techniques
Database Approach
Proteins can be identified efficiently without de novo sequencing all peptides, primarily by comparing experimental data with theoretical data from known protein sequences:
Peptide mass spectra (specifically the fragment ion spectra from MS/MS) are compared against a vast database of predicted mass spectra generated from known protein sequences (e.g., from genomic or EST databases). Algorithms calculate scores based on how well the observed fragmentation pattern matches theoretical patterns, identifying the most probable peptide and, consequently, the protein from which it originated.
Due to the high specificity of enzymatic digestion and the unique fragmentation patterns of peptides, most proteins can be reliably identified with just one or two uniquely mapped peptides, especially if these peptides are of high quality and show strong matches to the database.
Challenges in Protein Mapping
Occasionally, a single peptide sequence obtained from MS/MS may correspond to multiple proteins within the database. This can occur due to:
Shared sequence homology: Short, common peptide sequences might be present in functionally related proteins or protein families.
Alternative splicing: Different protein isoforms originating from the same gene via alternative splicing might share significant portions of their sequence.
This often results in grouping related proteins or protein families together ("protein groups") rather than identifying a single, unambiguous protein, posing a challenge for precise identification and quantification of individual isoforms. Further validation steps (e.g., using multiple unique peptides or orthogonal assays) may be required to resolve ambiguities.
Types of MS-based Proteomic Experiments
Expression Proteomics: Focuses on analyzing changes in protein presence, abundance, and relative quantification across different biological states (e.g., disease vs. healthy, treated vs. untreated). It aims to identify proteins whose levels are perturbed due to specific conditions or perturbations, often leading to biomarker discovery.
Post-Translational Modification (PTM) Studies: Dedicated to investigating specific covalent modifications on proteins (e.g., phosphorylation, glycosylation, ubiquitination). These studies typically involve enrichment strategies to isolate modified peptides before MS analysis to characterize the type, site, and extent of PTMs, which are crucial regulators of protein function.
Affinity Proteomics (Interactomics): Explores the physical interactions of proteins with other biological molecules (e.g., other proteins, DNA, RNA, small molecules). Techniques like co-immunoprecipitation coupled with MS are used to identify protein-protein interaction networks, protein complexes, and targets of specific proteins, providing insights into cellular pathways and molecular mechanisms.
Conclusion
Proteomics, leveraging advanced mass spectrometry techniques, provides significant, in-depth insights into the biological roles of proteins, their dynamic modifications, and complex interactions. This comprehensive understanding is fundamentally contributing to our knowledge of intricate cellular processes, the molecular basis of health, and the development of new diagnostic and therapeutic strategies for various disease states.