Protein Purification
The Peptide Bond and Primary Structure:
Proteins are linear polymers of amino acids linked by peptide bonds. Understanding the nature of this bond is crucial for predicting protein folding and stability.
Planarity and Resonance:
The peptide bond (C-N) has partial double-bond character due to resonance between the carbonyl oxygen and the amide nitrogen.
This resonance restricts rotation around the C-N bond, meaning the six atoms of the peptide group (Cα, C, O, N, H, Cα) lie in a single plane.
Bond Conformations:
Trans Conformation: In almost all peptide bonds, the trans configuration is heavily favoured over cis (ratio approximately 1000:1). This is because trans minimises steric repulsion between the side chains (R groups) attached to the alpha carbons (Cα).
Proline Exception: Proline is a unique case where the cis and trans configurations have similar energies, resulting in a much higher frequency of cis peptide bonds (roughly 10%) compared to other amino acids.
Principles of Protein Purification:
The Rationale for Purification:
Purification is the process of isolating a specific protein from a complex mixture (the proteome) to study its individual properties.
Key Objectives:
Characterisation: To determine the protein's biochemical properties, such as enzymatic activity, kinetic constants, or ligand binding.
Structural Biology: Purified protein is essential for techniques like X-ray crystallography or Cryo-EM to determine three-dimensional structures.
Industrial and Clinical Applications: High-purity proteins are required for the production of therapeutics (e.g., insulin, antibodies) and industrial enzymes
Selecting a Protein Source:
The choice of source material significantly impacts the ease and yield of purification.
Endogenous Sources:
Proteins are isolated directly from the original organism or tissue (e.g., liver, muscle).
Advantage: The protein is likely to have correct post-translational modifications and folding.
Disadvantage: Yields are often low, and the source material may be difficult to obtain in large quantities.
Recombinant Sources:
The gene of interest is cloned into an expression vector and produced in a host cell (e.g., E. coli, yeast, or mammalian cells).
Advantage: High levels of protein expression can be achieved (overexpression), and specific "tags" (e.g., His-tag) can be added to simplify purification.
Disadvantage: Incorrect folding or lack of necessary modifications can occur, particularly in bacterial systems
Extraction and Initial Fractionation:
Once a source is chosen, the protein must be released from the cells and separated from bulk contaminants.
Cell Lysis Techniques:
Mechanical: Using high-pressure homogenisers or "bead mills" to physically break cell walls.
Sonication: Applying high-frequency sound waves to disrupt membranes.
Osmotic Shock: Placing cells in a hypotonic solution to cause bursting.
Fractionation by Solubility:
Ammonium Sulphate Precipitation: Increasing concentrations of ammonium sulphate are added to the protein mixture, causing different proteins to precipitate at different points based on their hydrophobicity.
Dialysis: Often used after precipitation to remove excess salts. The protein solution is placed in a semi-permeable membrane bag; salt ions diffuse out into a large volume of buffer while the large protein molecules remain inside.
Chromatographic Techniques:
Chromatography is the primary method for high-resolution protein purification, relying on a stationary phase (matrix/beads) and a mobile phase (buffer).
Gel Filtration (Size Exclusion Chromatography):
This technique separates proteins based on their size and hydrodynamic volume.
Mechanism:
The column is packed with porous beads. Small proteins can enter the pores and are delayed as they travel through the internal volume of the beads.
Large proteins are "excluded" from the pores and travel only through the space between the beads, resulting in their earlier elution.
Ion Exchange Chromatography (IEX):
IEX separates proteins based on their net surface charge at a specific pH.
Anion Exchange: The matrix is positively charged (e.g., DEAE) and binds negatively charged proteins (anions).
Cation Exchange: The matrix is negatively charged (e.g., CM) and binds positively charged proteins (cations).
Elution: Proteins are typically eluted by increasing the salt concentration (competing for charge sites) or by altering the pH of the buffer.
Affinity Chromatography:
This is the most specific form of chromatography, exploiting the unique biological affinity of a protein for a specific ligand.
Principle: A ligand (e.g., a substrate analogue, an antibody, or a metal ion) is covalently attached to the column beads. Only the protein of interest binds to the ligand, while all other proteins wash through.
Tags: In recombinant proteins, specific sequences like the Poly-histidine tag (His-tag) allow for easy purification using Immobilised Metal Affinity Chromatography (IMAC), where the tag binds to nickel or cobalt ions on the matrix
Quantitative Monitoring of Protein Purification:
To ensure a purification strategy is effective, scientists must track the protein of interest at every step. This involves quantifying both the total amount of protein and the amount of the specific target protein.
Activity and Specific Activity:
For enzymes, the "amount" is often measured by their catalytic activity rather than just mass.
Enzyme Activity: This is the total units of enzyme present in a solution. One unit (U) is typically defined as the amount of enzyme that transforms 1 micromole of substrate per minute under standard conditions.
Specific Activity: This is the ratio of enzyme activity to the total amount of protein in the mixture (Units/mg).
As purification progresses, the specific activity should increase, as the "contaminating" proteins are removed while the target enzyme is retained.
A constant specific activity across two different purification steps suggests that the protein is likely pure.
Yield and Fold Purification:
Yield: The percentage of the initial activity retained after a step.
Fold Purification: The increase in specific activity relative to the crude extract.
Total Protein Quantification:
Total protein concentration is commonly measured using colorimetric assays.
Bradford Assay: Uses Coomassie Brilliant Blue G-250 dye, which shifts from red to blue upon binding to proteins (specifically basic and aromatic residues).
BCA Assay: Based on the reduction of Cu2+ to Cu+ by proteins in an alkaline medium, which then reacts with bicinchoninic acid to form a purple complex.
Assessing Purity via Gel Electrophoresis:
SDS-PAGE (Sodium Dodecyl Sulphate Polyacrylamide Gel Electrophoresis):
This technique separates proteins based strictly on their molecular mass, independent of their original charge or shape.
Denaturation: Proteins are heated with SDS, an anionic detergent. SDS disrupts non-covalent bonds, unfolding the protein into a linear chain.
Charge Masking: SDS molecules bind to the protein backbone at a constant ratio (roughly 1.4g SDS per 1g protein). This imparts a uniform negative charge-to-mass ratio, ensuring that the velocity of the protein through the gel depends only on its size.
The Sieve Effect: The polyacrylamide gel acts as a molecular sieve. When an electric field is applied, smaller proteins move faster and further through the pores than larger proteins.
Visualisation: After the run, proteins are stained (e.g., with Coomassie Blue or Silver Stain). A single, sharp band indicates a high degree of purity.
Two-Dimensional (2D) Gel Electrophoresis:
For complex mixtures (such as total cell lysates), 1D SDS-PAGE is often insufficient because multiple proteins may share the same molecular weight. 2D electrophoresis provides a higher resolution by separating proteins by two independent properties.
First Dimension: Isoelectric Focusing (IEF):
Proteins are separated in a pH gradient based on their isoelectric point (pI)—the pH at which the protein has no net charge and ceases to migrate in an electric field.
Second Dimension: SDS-PAGE:
The IEF strip is placed atop an SDS-PAGE gel. The proteins are then separated perpendicularly based on their denatured molecular weight.
High-Resolution Output:
This technique can resolve thousands of individual protein "spots" from a single sample (e.g., ~1000 spots from 200mg of CHO cell protein).
Spot Intensity: The intensity of each silver-stained spot is directly related to the abundance of that specific protein in the cell.
Modular Protein Domains and Signalling:
Modern proteomics has revealed that many proteins are "modular," consisting of distinct, independently folding domains that recognise specific motifs in other proteins.
Common Modular Domains:
These domains are essential for the formation of multi-protein complexes in cellular signalling.
SH2 (Src Homology 2) Domains: Specifically recognise and bind to phosphotyrosine residues. They allow proteins to be recruited to activated receptor tyrosine kinases.
SH3 (Src Homology 3) Domains: Recognise and bind to proline-rich sequences (often with the motif P-X-X-P).
PH (Pleckstrin Homology) Domains: Bind to specific phosphoinositides (phospholipids) in the cell membrane, allowing proteins to be localised to the membrane surface.
PTB (Phosphotyrosine Binding) Domains: Similar to SH2, but they recognise phosphotyrosine within a specific N-P-X-Y motif
Evolutionary and Functional Importance:
Domain Shuffling: Throughout evolution, these modules have been "shuffled" into different combinations to create proteins with diverse functions.
Scaffolding: Some proteins consist almost entirely of these domains and act as "scaffolds," bringing multiple signalling components together in space and time.
Advanced Analytical Characterisation:
After purification, further techniques are used to confirm the identity and structural integrity of the protein.
Mass Spectrometry (MS):
Mass spectrometry is the gold standard for protein identification and the detection of post-translational modifications (PTMs).
Ionisation: The protein is converted into gas-phase ions (using techniques like MALDI or Electrospray Ionisation).
Mass-to-Charge Ratio (m/z): The ions are accelerated through a vacuum, and their m/z is measured. This allows for incredibly precise determination of molecular weight.
Tandem MS (MS/MS): Individual peptides are fragmented to determine their amino acid sequence, which can then be matched against genomic databases to identify the protein.
N-terminal Sequencing (Edman Degradation):
Although largely superseded by MS, Edman degradation is used to determine the sequence of amino acids at the N-terminus.
Mechanism: The N-terminal residue is chemically labelled, cleaved, and identified by chromatography. The process is then repeated for the next residue.
Limitations: It cannot be performed if the N-terminus is chemically "blocked" (e.g., acetylated), which is common in eukaryotic proteins.