Proteomics
Munster Technological University (MTU)
Course: Proteomics - BIOL8023
Instructor: Dr. Saravana Sivagnanam
Multi Omics
Website: www.mtu.ie
What is the Proteome?
Proteins are biological molecules composed of building blocks known as amino acids.
Proteins are essential to life, serving a wide variety of functions including:
Structural
Metabolic
Transport
Immune response
Signaling and regulatory roles
The term "proteome" was introduced by Australian Ph.D. student Marc Wilkins during a symposium in 1994 in Siena, Italy.
What is Proteomics?
Proteomics is the study of the proteome, focusing on how different proteins interact with one another and their roles within the cell.
Key Concepts in Proteomics
Protein Expression:
It is essential to recognize that mRNA expression levels do not always correlate well with protein expression levels.
The study of mRNA fails to account for:
Posttranslational modifications
Protein cleavage
Formation of complexes
Variant mRNA transcripts, all of which are crucial for protein function.
Historical Context:
The first proteomic studies began in 1975 with the development of two-dimensional (2D) protein electrophoresis.
Applications of Proteomics
Case Study: Hemoglobin
Hemoglobin plays a crucial role in picking up oxygen in the lungs, transporting it through the blood, and delivering it to the cells.
Example of a disease related to protein mutation:
Sickle Cell Disease: Caused by a single amino acid change in the hemoglobin protein.
Tools of Proteomics
1. Protein Separation Technology
Simplifies complex protein mixtures and targets specific proteins for analysis.
2. Mass Spectrometry (MS)
Provides accurate molecular mass measurements of intact proteins and peptides.
3. Database Resources
Access to protein databases, expressed sequence tags (EST), and complete genome sequence databases.
4. Software Collection
Software is used to match MS data with specific protein sequences in databases.
Difference Between Genomics and Proteomics
Aspect | Genomics | Proteomics |
|---|---|---|
Study Focus | Study of genomes and their functions | Study of proteomes and their functions |
Methods | Genome sequence mapping, variant analysis | Protein sequence mapping, 3D structure modeling, protein-protein interactions |
Sequencing | Utilizes Sanger sequencing and next-generation sequencing methods | Employs mass spectrometry, affinity proteomics, and protein microarrays methods |
Interpretation | Indirectly suggests physiological states | Directly specifies physiological states with spatio-temporal resolution |
What are Proteins?
Definition and Function
A protein is a macromolecule composed of one or more chains of amino acids.
Examples of protein functions include:
Catalyzing metabolic processes (e.g., pepsin, insulin)
Facilitating replication processes (e.g., DNA polymerase)
Maintaining cell structure (e.g., keratin)
Regulating cell signaling (e.g., hormones)
Enabling transport (e.g., hemoglobin)
Functioning in storage (e.g., ferritin)
Protecting through cell defense (e.g., immunoglobulins)
Assisting in cell movement (e.g., actin, myosin)
Structural Organization of Proteins
Proteins have several structural levels:
Primary Structure
Secondary Structure
Tertiary Structure
Quaternary Structure
The native structure of a protein is essential for its biological function; any loss of structure can lead to a loss of function.
Primary Structure of Proteins
Definition
The primary structure of a protein is a linear polypeptide chain consisting of amino acids linked by peptide bonds.
Classification of Peptides
Peptides:
Dipeptides: 2 amino acids
Tripeptides: 3 amino acids
Tetrapeptides: 4 amino acids
Oligopeptides: up to 20 amino acids
Polypeptides: 20 to 50 amino acids
Proteins: more than 50 amino acids
Genetic Encoding
The amino acid sequence is primarily dictated by the DNA sequence of the corresponding gene (genetic code).
Codon Definition
A codon is a sequence of three nucleotides in DNA or RNA that corresponds to a specific amino acid.
There are 64 possible codons formed from combinations of the four nitrogenous bases found in DNA/RNA.
There are 20 amino acids universally encoded by most organisms, with some amino acids being specified by more than one codon (referred to as degeneracy).
Each codon encodes only one specific amino acid, and these codes are universal across different organisms.
Further Codon Analysis
There are a total of 61 codons that code for individual amino acids, while 3 act as stop codons.
Example: The codon ACU codes for the amino acid Threonine.
Example Codon Tables
Sequence Options:
Option 1: CAAUGCGACCUAAGAUCUAA
Option 2: CAAUGCGACCUAAGAUCUAA
Option 3: CAAUGCGACCUAAGAUCUAA
Succeeding Together Analysis: Codon-table translations are crucial for protein synthesis, with each specific triplet corresponding to particular amino acids like Phenylalanine (Phe), Leucine (Leu), or stop codons.
Recognizing the expectations in sequencing is vital for functional proteomics.
Secondary Structure of Proteins
The secondary structure refers to the local folding of the polypeptide backbone into 3-D configurations.
Stabilized through hydrogen bonding between backbone N-H groups and C=O groups, resulting in:
α-helix: Formed by interactions within the same polypeptide chain.
β-sheets: Formed by interactions between parallel polypeptide chains.
Tertiary Structure of Proteins
Represents the final 3-D conformation of a protein resulting from the folding of various secondary structures.
Stabilization occurs through interactions:
Hydrophobic interactions
Hydrophilic interactions
Ionic interactions (salt bridges)
Disulfide bridges (cysteine residues)
Hydrogen bonding
Example: Myoglobin, primarily found in striated muscles, illustrates tertiary structure in functional protein capacity.
Amino Acid Properties
Amino acids are categorized based on various properties, such as hydropathy, volume, chemical properties, charge, and polarity.
Table Representation: Each amino acid has a unique abbreviation and characteristics that define its behavioral properties in a biological context.
Example properties:
Alanine (Ala): Hydrophobic, small, aliphatic.
Arginine (Arg): Hydrophilic, basic, and positively charged.
Quaternary Structure of Proteins
The quaternary structure describes the arrangement of multiple protein subunits.
Stabilization results from:
Hydrogen bonding
Van der Waals forces
Disulfide bridges (cysteine)
Example: Hemoglobin consists of four subunits.
Summary of Protein Structure
Primary: Sequence of amino acids in a polypeptide chain.
Secondary: Localized folding through hydrogen bonds (α-helices and β-sheets).
Tertiary: Overall 3-D shape formed by interactions between secondary structures.
Quaternary: Combination of multiple polypeptide chains into a single functional protein.
Translation Process
Overview
Translation refers to the process where mRNA is converted into a sequence of amino acids during protein synthesis, essential for all living organisms.
Mechanism
The mRNA sequence directs the creation of proteins through converting genetic code (codons) to amino acid sequences.
Translation takes place in the cytoplasm.
Ribosomes, composed of rRNA and proteins, perform the translation process.
Ribosome structure:
Prokaryotes have 70S ribosomes (30S + 50S).
Eukaryotes have 80S ribosomes (40S + 60S).
rRNA catalyzes the addition of amino acids through peptide bond formation.
tRNA delivers appropriate amino acids to the ribosome through mRNA codon-anticodon complementarity.
Post-Translational Modifications (PTM)
Definition
PTMs refer to modifications that synthesized proteins undergo before final functional forms are established.
Characteristics of PTMs
Modifications may be irreversible or reversible.
Examples include:
Enzymatic cleavage of peptide bonds, such as insulin propeptide.
Addition of chemical groups to amino acid side chains (e.g., phosphorylation).
PTMs may occur at any point in the protein lifecycle.
Importance of Post-Translational Modifications
PTMs expand the coding capacity of the genome, allowing for a highly diversified proteome from the coding of DNA which typically encodes 20 primary amino acids.
Proteins can contain various residues due to different types of PTMs.
Functions of Specific PTMs
PTM Type | Function Example |
|---|---|
Proteolysis | Activation |
Phosphorylation | Activation |
Glycosylation | Secretion |
Methylation | Modulating protein function |
Hydroxylation | Modulating structure |
Ubiquitination | Degradation |
Why Study Proteomics?
Protein diversity cannot be solely predicted from genetic code or gene expression studies:
Variants of mRNA can arise from single genes (PTM).
Protein abundance is not reliably predicted by mRNA levels due to unknown translation rates and degradation.
PTMs cannot be detected from mRNA.
Protein localization and interactions are not predictable from mRNA data.
Conclusion: Proteomics offers techniques to capture protein diversity effectively.
Proteomics Techniques
Proteomics encompasses various technologies for identifying and quantifying proteins present in a specific cell, tissue, or organism:
Separation
Identification
Quantification
Functional Analysis
Structural Analysis
Considerations in Studying the Proteome
Sample Preparation
Precise conditions needed:
Cold environments, protease inhibitors, organelle isolation (approximately).
Proteomic Workflow
Pre-preparation Steps
Sample Separation:
1D and 2D gel electrophoresis
Reverse Phase HPLC
Strong cation exchange (SCX HPLC)
Structure or Mass Information via Mass Spectrometry:
Protein Identification through database searching using tools like SWISS-PROT, TrEMBL, RefSeq XPs, and Ensembl.
Protein Separation Techniques
Need for Separation: Since the proteome consists of complex mixtures of proteins, separation is critical for identification and characterization.
Methods for Separation:
1-D gel electrophoresis (based on molecular mass, e.g., SDS-PAGE)
2-D gel electrophoresis (based on net charge and mass, e.g., Isoelectric focusing + SDS-PAGE)
Liquid chromatography (LC, separating based on interactions with liquid and stationary phases)
2-D Gel Electrophoresis
Process:
Proteins will migrate to their isoelectric point through an electric gradient in isoelectric focusing, then separated by mass using SDS-PAGE.
2D-DIGE: Fluorescent protein labeling allows multiple samples to co-electrophorese on one gel for enhanced analysis.
Fluorescence Spectroscopy
Configuration:
Utilizes a xenon lamp, monochromator, and a lens to detect emitted photon signals from the sample.
Critical in detecting light emission post-excitation.
Chromatography Techniques
Gel Filtration: Separates proteins based solely on molecular size.
Hydrophobic Interaction Chromatography (HIC): Separates proteins based on their hydrophobic interactions with ligands.
Ion Exchange (IE): Separates proteins based on net charge.
Reverse Phase Chromatography: Relies on hydrophobic interactions between molecules in mobile and stationary phases.
Affinity Chromatography: Utilizes specific binding interactions between an immobilized ligand and its target binding partner.
Protein Identification Techniques
Following separation, identification is necessary through:
Immunoassays:
Based on specific antibody reactions (ELISA, Western blotting, protein microarrays).
Mass Spectrometry (MS):
Measures mass-to-charge ratios of ions for detection.
Can quantify proteins and detect interactions and PTMs.
High-throughput and suited for proteome characterization.
Western Blotting
Detects specific proteins in mixtures using antibodies.
Capable of monitoring expression changes and PTMs.
Can quantify proteins and be combined with SDS-PAGE.
Protein Microarrays
Enables simultaneous detection of numerous proteins using antibodies and labeled probes.
Very sensitive, can quantify and analyze PTM interactions.
Mass Spectrometry in Protein Analysis
MS measures mass-to-charge ratios of ions to create spectra specific for protein identification.
Highly accurate and sensitive; can combine with LC for automation.
Tandem MS (MS/MS): Increases sample resolution.
Key ionization techniques:
Electrospray Ionization (ESI)
Matrix-Assisted Laser Desorption/Ionization (MALDI)
Example Applications of Mass Spectrometry
Rapid and affordable screening of blood abnormalities such as haemoglobinopathies and pre-diabetes, demonstrating value in clinical laboratories.
Protein Quantification Techniques
ICAT (Isotope Coded Affinity Tag):
Identifies and quantifies protein mixtures using chemical labels that label cysteine residues.
Enables analysis of low-abundance proteins, direct testing of mixed samples, allows comparison of protein expression changes under different conditions.
Limitations of ICAT:
Database limitations and specific labeling constraints (only cysteine).
Potential errors in quantification.
Proteome Analysis Approaches:
Top-Down: Intact proteins analyzed for isoforms and PTMs.
Bottom-Up (Shotgun): Proteins digested into peptides for analysis.
Functional Proteomics
Driven by genome sequencing efforts and aims to:
Determine biological functions of unknown proteins.
Investigate cellular mechanisms and signaling pathways.
Structural Proteomics
Determines the three-dimensional structure of proteins, crucial for understanding biochemical function and protein interaction mechanisms.
Techniques employed include:
X-ray Crystallography
Nuclear Magnetic Resonance (NMR) Spectroscopy
X-ray Crystallography
Determines atomic and molecular structure at nanometer resolution, providing visualization strengths for protein structures.
NMR Spectroscopy
Studies molecular interactions via radiofrequency electromagnetic radiation in strong magnetic fields.
Effective for protein sizes up to 350 amino acids without crystallization necessity.
Summary of Proteomic Technologies
Purification: Chromatographic techniques
Analysis: ELISA, Western blotting, protein microarray
Characterization: Gel-based approaches, mass spectrometry
Structural Analysis: X-ray crystallography, NMR spectroscopy
Quantification: ICAT, SILAC, iTRAQ
Bioinformatics Analysis requires integration of technologies for comprehensive data management.
Proteome Database Utilization
Public databases established for storing large volumes of data generated in studies; providing access such as:
GenBank: Protein sequence database.
RefSeq: Protein sequence database.
UniProt: Functional information database.
CATH: Evolution and categorization of proteins.