Proteomics

Munster Technological University (MTU)

Course: Proteomics - BIOL8023

Instructor: Dr. Saravana Sivagnanam

Multi Omics

Website: www.mtu.ie

What is the Proteome?

Proteins are biological molecules composed of building blocks known as amino acids.
Proteins are essential to life, serving a wide variety of functions including:
- Structural
- Metabolic
- Transport
- Immune response
- Signaling and regulatory roles
The term "proteome" was introduced by Australian Ph.D. student Marc Wilkins during a symposium in 1994 in Siena, Italy.

What is Proteomics?

Proteomics is the study of the proteome, focusing on how different proteins interact with one another and their roles within the cell.

Key Concepts in Proteomics

Protein Expression:
- It is essential to recognize that mRNA expression levels do not always correlate well with protein expression levels.
- The study of mRNA fails to account for:
  - Posttranslational modifications
  - Protein cleavage
  - Formation of complexes
  - Variant mRNA transcripts, all of which are crucial for protein function.
Historical Context:
- The first proteomic studies began in 1975 with the development of two-dimensional (2D) protein electrophoresis.

Applications of Proteomics

Case Study: Hemoglobin

Hemoglobin plays a crucial role in picking up oxygen in the lungs, transporting it through the blood, and delivering it to the cells.
Example of a disease related to protein mutation:
- Sickle Cell Disease: Caused by a single amino acid change in the hemoglobin protein.

Tools of Proteomics

1. Protein Separation Technology

Simplifies complex protein mixtures and targets specific proteins for analysis.

2. Mass Spectrometry (MS)

Provides accurate molecular mass measurements of intact proteins and peptides.

3. Database Resources

Access to protein databases, expressed sequence tags (EST), and complete genome sequence databases.

4. Software Collection

Software is used to match MS data with specific protein sequences in databases.

Difference Between Genomics and Proteomics

Aspect	Genomics	Proteomics
Study Focus	Study of genomes and their functions	Study of proteomes and their functions
Methods	Genome sequence mapping, variant analysis	Protein sequence mapping, 3D structure modeling, protein-protein interactions
Sequencing	Utilizes Sanger sequencing and next-generation sequencing methods	Employs mass spectrometry, affinity proteomics, and protein microarrays methods
Interpretation	Indirectly suggests physiological states	Directly specifies physiological states with spatio-temporal resolution

What are Proteins?

Definition and Function

A protein is a macromolecule composed of one or more chains of amino acids.
Examples of protein functions include:
- Catalyzing metabolic processes (e.g., pepsin, insulin)
- Facilitating replication processes (e.g., DNA polymerase)
- Maintaining cell structure (e.g., keratin)
- Regulating cell signaling (e.g., hormones)
- Enabling transport (e.g., hemoglobin)
- Functioning in storage (e.g., ferritin)
- Protecting through cell defense (e.g., immunoglobulins)
- Assisting in cell movement (e.g., actin, myosin)

Structural Organization of Proteins

Proteins have several structural levels:

Primary Structure
Secondary Structure
Tertiary Structure
Quaternary Structure

The native structure of a protein is essential for its biological function; any loss of structure can lead to a loss of function.

Primary Structure of Proteins

Definition

The primary structure of a protein is a linear polypeptide chain consisting of amino acids linked by peptide bonds.

Classification of Peptides

Peptides:
- Dipeptides: 2 amino acids
- Tripeptides: 3 amino acids
- Tetrapeptides: 4 amino acids
Oligopeptides: up to 20 amino acids
Polypeptides: 20 to 50 amino acids
Proteins: more than 50 amino acids

Genetic Encoding

The amino acid sequence is primarily dictated by the DNA sequence of the corresponding gene (genetic code).

Codon Definition

A codon is a sequence of three nucleotides in DNA or RNA that corresponds to a specific amino acid.
- There are 64 possible codons formed from combinations of the four nitrogenous bases found in DNA/RNA.
- There are 20 amino acids universally encoded by most organisms, with some amino acids being specified by more than one codon (referred to as degeneracy).
- Each codon encodes only one specific amino acid, and these codes are universal across different organisms.

Further Codon Analysis

There are a total of 61 codons that code for individual amino acids, while 3 act as stop codons.
Example: The codon ACU codes for the amino acid Threonine.

Example Codon Tables

Sequence Options:
- Option 1: CAAUGCGACCUAAGAUCUAA
- Option 2: CAAUGCGACCUAAGAUCUAA
- Option 3: CAAUGCGACCUAAGAUCUAA
Succeeding Together Analysis: Codon-table translations are crucial for protein synthesis, with each specific triplet corresponding to particular amino acids like Phenylalanine (Phe), Leucine (Leu), or stop codons.
Recognizing the expectations in sequencing is vital for functional proteomics.

Secondary Structure of Proteins

The secondary structure refers to the local folding of the polypeptide backbone into 3-D configurations.
Stabilized through hydrogen bonding between backbone N-H groups and C=O groups, resulting in:
- α-helix: Formed by interactions within the same polypeptide chain.
- β-sheets: Formed by interactions between parallel polypeptide chains.

Tertiary Structure of Proteins

Represents the final 3-D conformation of a protein resulting from the folding of various secondary structures.
Stabilization occurs through interactions:
- Hydrophobic interactions
- Hydrophilic interactions
- Ionic interactions (salt bridges)
- Disulfide bridges (cysteine residues)
- Hydrogen bonding
Example: Myoglobin, primarily found in striated muscles, illustrates tertiary structure in functional protein capacity.

Amino Acid Properties

Amino acids are categorized based on various properties, such as hydropathy, volume, chemical properties, charge, and polarity.
Table Representation: Each amino acid has a unique abbreviation and characteristics that define its behavioral properties in a biological context.
- Example properties:
- Alanine (Ala): Hydrophobic, small, aliphatic.
- Arginine (Arg): Hydrophilic, basic, and positively charged.

Quaternary Structure of Proteins

The quaternary structure describes the arrangement of multiple protein subunits.
Stabilization results from:
- Hydrogen bonding
- Van der Waals forces
- Disulfide bridges (cysteine)
- Example: Hemoglobin consists of four subunits.

Summary of Protein Structure

Primary: Sequence of amino acids in a polypeptide chain.
Secondary: Localized folding through hydrogen bonds (α-helices and β-sheets).
Tertiary: Overall 3-D shape formed by interactions between secondary structures.
Quaternary: Combination of multiple polypeptide chains into a single functional protein.

Translation Process

Overview

Translation refers to the process where mRNA is converted into a sequence of amino acids during protein synthesis, essential for all living organisms.

Mechanism

The mRNA sequence directs the creation of proteins through converting genetic code (codons) to amino acid sequences.
Translation takes place in the cytoplasm.
Ribosomes, composed of rRNA and proteins, perform the translation process.
- Ribosome structure:
- Prokaryotes have 70S ribosomes (30S + 50S).
- Eukaryotes have 80S ribosomes (40S + 60S).
rRNA catalyzes the addition of amino acids through peptide bond formation.
tRNA delivers appropriate amino acids to the ribosome through mRNA codon-anticodon complementarity.

Post-Translational Modifications (PTM)

Definition

PTMs refer to modifications that synthesized proteins undergo before final functional forms are established.

Characteristics of PTMs

Modifications may be irreversible or reversible.
Examples include:
- Enzymatic cleavage of peptide bonds, such as insulin propeptide.
- Addition of chemical groups to amino acid side chains (e.g., phosphorylation).
PTMs may occur at any point in the protein lifecycle.

Importance of Post-Translational Modifications

PTMs expand the coding capacity of the genome, allowing for a highly diversified proteome from the coding of DNA which typically encodes 20 primary amino acids.
Proteins can contain various residues due to different types of PTMs.

Functions of Specific PTMs

PTM Type	Function Example
Proteolysis	Activation
Phosphorylation	Activation
Glycosylation	Secretion
Methylation	Modulating protein function
Hydroxylation	Modulating structure
Ubiquitination	Degradation

Why Study Proteomics?

Protein diversity cannot be solely predicted from genetic code or gene expression studies:
- Variants of mRNA can arise from single genes (PTM).
- Protein abundance is not reliably predicted by mRNA levels due to unknown translation rates and degradation.
- PTMs cannot be detected from mRNA.
- Protein localization and interactions are not predictable from mRNA data.
Conclusion: Proteomics offers techniques to capture protein diversity effectively.

Proteomics Techniques

Proteomics encompasses various technologies for identifying and quantifying proteins present in a specific cell, tissue, or organism:

Separation
Identification
Quantification
Functional Analysis
Structural Analysis

Considerations in Studying the Proteome

Sample Preparation

Precise conditions needed:
- Cold environments, protease inhibitors, organelle isolation (approximately).

Proteomic Workflow

Pre-preparation Steps
Sample Separation:
- 1D and 2D gel electrophoresis
- Reverse Phase HPLC
- Strong cation exchange (SCX HPLC)
Structure or Mass Information via Mass Spectrometry:
Protein Identification through database searching using tools like SWISS-PROT, TrEMBL, RefSeq XPs, and Ensembl.

Protein Separation Techniques

Need for Separation: Since the proteome consists of complex mixtures of proteins, separation is critical for identification and characterization.
Methods for Separation:
- 1-D gel electrophoresis (based on molecular mass, e.g., SDS-PAGE)
- 2-D gel electrophoresis (based on net charge and mass, e.g., Isoelectric focusing + SDS-PAGE)
- Liquid chromatography (LC, separating based on interactions with liquid and stationary phases)

2-D Gel Electrophoresis

Process:
- Proteins will migrate to their isoelectric point through an electric gradient in isoelectric focusing, then separated by mass using SDS-PAGE.
2D-DIGE: Fluorescent protein labeling allows multiple samples to co-electrophorese on one gel for enhanced analysis.

Fluorescence Spectroscopy

Configuration:
- Utilizes a xenon lamp, monochromator, and a lens to detect emitted photon signals from the sample.
- Critical in detecting light emission post-excitation.

Chromatography Techniques

Gel Filtration: Separates proteins based solely on molecular size.
Hydrophobic Interaction Chromatography (HIC): Separates proteins based on their hydrophobic interactions with ligands.
Ion Exchange (IE): Separates proteins based on net charge.
Reverse Phase Chromatography: Relies on hydrophobic interactions between molecules in mobile and stationary phases.
Affinity Chromatography: Utilizes specific binding interactions between an immobilized ligand and its target binding partner.

Protein Identification Techniques

Following separation, identification is necessary through:

Immunoassays:
- Based on specific antibody reactions (ELISA, Western blotting, protein microarrays).
Mass Spectrometry (MS):
- Measures mass-to-charge ratios of ions for detection.
- Can quantify proteins and detect interactions and PTMs.
- High-throughput and suited for proteome characterization.

Western Blotting

Detects specific proteins in mixtures using antibodies.
Capable of monitoring expression changes and PTMs.
Can quantify proteins and be combined with SDS-PAGE.

Protein Microarrays

Enables simultaneous detection of numerous proteins using antibodies and labeled probes.
Very sensitive, can quantify and analyze PTM interactions.

Mass Spectrometry in Protein Analysis

MS measures mass-to-charge ratios of ions to create spectra specific for protein identification.
Highly accurate and sensitive; can combine with LC for automation.
Tandem MS (MS/MS): Increases sample resolution.
Key ionization techniques:
- Electrospray Ionization (ESI)
- Matrix-Assisted Laser Desorption/Ionization (MALDI)

Example Applications of Mass Spectrometry

Rapid and affordable screening of blood abnormalities such as haemoglobinopathies and pre-diabetes, demonstrating value in clinical laboratories.

Protein Quantification Techniques

ICAT (Isotope Coded Affinity Tag):
- Identifies and quantifies protein mixtures using chemical labels that label cysteine residues.
- Enables analysis of low-abundance proteins, direct testing of mixed samples, allows comparison of protein expression changes under different conditions.
Limitations of ICAT:
- Database limitations and specific labeling constraints (only cysteine).
- Potential errors in quantification.
Proteome Analysis Approaches:
- Top-Down: Intact proteins analyzed for isoforms and PTMs.
- Bottom-Up (Shotgun): Proteins digested into peptides for analysis.

Functional Proteomics

Driven by genome sequencing efforts and aims to:
- Determine biological functions of unknown proteins.
- Investigate cellular mechanisms and signaling pathways.

Structural Proteomics

Determines the three-dimensional structure of proteins, crucial for understanding biochemical function and protein interaction mechanisms.
Techniques employed include:
- X-ray Crystallography
- Nuclear Magnetic Resonance (NMR) Spectroscopy

X-ray Crystallography

Determines atomic and molecular structure at nanometer resolution, providing visualization strengths for protein structures.

NMR Spectroscopy

Studies molecular interactions via radiofrequency electromagnetic radiation in strong magnetic fields.
- Effective for protein sizes up to 350 amino acids without crystallization necessity.

Summary of Proteomic Technologies

Purification: Chromatographic techniques
Analysis: ELISA, Western blotting, protein microarray
Characterization: Gel-based approaches, mass spectrometry
Structural Analysis: X-ray crystallography, NMR spectroscopy
Quantification: ICAT, SILAC, iTRAQ
Bioinformatics Analysis requires integration of technologies for comprehensive data management.

Proteome Database Utilization

Public databases established for storing large volumes of data generated in studies; providing access such as:
- GenBank: Protein sequence database.
- RefSeq: Protein sequence database.
- UniProt: Functional information database.
- CATH: Evolution and categorization of proteins.