Deep Dive: Amino Acids, Peptides, and Proteins
Introduction to Proteins: The Molecular Workhorses
Life at the cellular level is incredibly complex, with a multitude of processes occurring molecule by molecule.
Proteins are the "workhorses" or "molecular machines" behind almost every cellular process, exhibiting vast functional diversity.
Understanding proteins is foundational to comprehending molecular mechanisms in biology and medicine.
This deep dive draws heavily from Chapter 3 of "Leninger Principles of Biochemistry," focusing on how these molecular machines are built, studied, and what their structure reveals about life.
Amino Acids: The Fundamental Building Blocks
Proteins are constructed from a common set of just 20 standard alpha-amino acids, despite the immense diversity of protein functions (e.g., enzymes, structural proteins like keratin).
Each of these 20 amino acids possesses a unique "side chain" or "R group," which dictates its specific chemical properties and acts like an alphabet for protein structure.
Common Features of Alpha-Amino Acids
All 20 standard amino acids are "alpha-amino acids," meaning they have a central carbon atom (the alpha carbon) bonded to both:
A carboxyl group ( ext{COO}^- or ext{COOH})
An amino group ( ext{NH}3^+ or ext{NH}2)
A hydrogen atom
The unique R group.
The R group varies in structure, size, and electrical charge, influencing the amino acid's behavior, particularly its solubility in water.
Proline: The Exception
Proline is structurally unique because its side chain loops back and bonds to its own amino group, forming a rigid ring structure.
This rigidity makes any protein region containing proline less flexible, acting as a structural "kink" or stiffener, which is critical for specific protein shapes.
Stereoisomerism and Chirality
Except for glycine (whose R group is a simple hydrogen atom, making its alpha carbon achiral), all other 19 amino acids have a chiral alpha carbon.
Chiral centers mean they can exist in two mirror-image forms called enantiomers (like left and right hands, non-superimposable).
Biological Significance: Nearly all amino acids found in proteins are exclusively the L-stereoisomer (L-form).
This is not random; cells specifically synthesize L-forms because the enzymes that build proteins are themselves asymmetric and engineered to recognize and work only with the L-form.
This ensures stereospecificity, which is fundamental to protein structure and function.
Classification of Amino Acids (Based on R Groups)
Amino acids are typically grouped into five main classes based on their R group's polarity and charge at physiological pH ( ext{pH} ext{ } 7).
Nonpolar, Aliphatic R Groups (Hydrophobic):
These R groups are "oily" or "greasy" and tend to exclude water, clustering on the interior of proteins, driven by the "hydrophobic effect" which is a major force shaping 3 ext{D} protein structure.
Examples:
Glycine: Smallest R group (just hydrogen), allows for significant flexibility.
Proline: Introduces rigidity due to its unique ring structure.
Methionine: One of two sulfur-containing amino acids.
Aromatic R Groups:
Phenylalanine, Tyrosine, and Tryptophan.
Contain bulky ring structures and are relatively nonpolar, contributing to the hydrophobic core of proteins.
Tyrosine and Tryptophan are slightly more polar due to hydroxyl or nitrogen groups and can participate in hydrogen bonding.
Laboratory Utility: These aromatic rings (especially tryptophan and tyrosine) strongly absorb ultraviolet (UV) light at 280 ext{ nm}, a property used in the Lambert-Beer law to estimate protein concentration (absorbance is proportional to concentration).
Polar, Uncharged R Groups (Hydrophilic):
Serine, Threonine, Asparagine, and Glutamine.
Their R groups (e.g., hydroxyls on serine/threonine, amide groups on asparagine/glutamine) can form hydrogen bonds with water molecules.
Often found on the protein surface, interacting with surrounding water.
Cysteine: Often grouped here, but unique due to its sulfhydryl ( ext{SH}) group. Two cysteine residues can be oxidized to form a covalent "disulfide bond" or "disulfide bridge" ( ext{S-S}). These bonds act like molecular staples, providing significant stability and structural integrity to proteins, particularly extracellular ones.
Positively Charged (Basic) R Groups:
Lysine, Arginine, and Histidine.
At physiological pH, their side chains carry a significant positive charge (due to amino or other nitrogen-containing groups), making them very hydrophilic.
Often found on the protein surface, interacting with water or negatively charged molecules (e.g., DNA).
Histidine's Special Role: Its R group's ext{pKa} is near neutral pH, allowing it to easily gain or lose a proton. This makes histidine crucial in enzyme active sites, acting as a "proton shuttle" for catalysis.
Negatively Charged (Acidic) R Groups:
Aspartate and Glutamate.
Each has a second carboxyl group in its side chain, which is typically deprotonated at ext{pH } 7, giving the side chain a net negative charge.
Like basic amino acids, they are hydrophilic and often involved in binding positive ions or molecules.
Acid-Base Properties of Amino Acids
Amino acids are "ampholytes" or "amphoteric," meaning they can act as both weak acids and weak bases because they contain both an acidic carboxyl group and a basic amino group.
Zwitterions: At neutral pH (around ext{pH } 7), amino acids exist primarily as zwitterions (German for "hybrid ions").
A zwitterion has a positive charge on the protonated amino group ( ext{NH}_3^+) and a negative charge on the deprotonated carboxyl group ( ext{COO}^-), but the net charge is zero.
Titration Curves: Titration curves illustrate the sequential deprotonation of amino acid groups as pH increases. The points where protons are lost are the ext{pKa} values.
Isoelectric Point (pI): The pI is the specific pH at which an amino acid (or protein) has a net electric charge of exactly zero.
For simple amino acids like glycine (no ionizable R group), ext{pI} = ( ext{pKa}1 + ext{pKa}2) / 2.
For amino acids with ionizable R groups, the calculation is more complex but follows the same principle: finding the pH where positive and negative charges balance.
Practical Implications: The charge of a protein, which depends on the solution pH relative to its pI, is a critical property used in protein purification techniques like ion-exchange chromatography.
If ext{pH} > ext{pI}, the protein has a net negative charge.
If ext{pH} < ext{pI}, the protein has a net positive charge.
Local pKa Shifts: Within a folded protein, the local environment (e.g., proximity to other charged groups) can subtly alter the ext{pKa} values of individual amino acid residues, which is often exploited in enzyme active sites for efficient catalysis.
Uncommon Amino Acids
Besides the 20 standard amino acids, others play important roles:
Post-translational Modifications: Some are formed by modifying a standard amino acid after protein synthesis. Example: 4-hydroxyproline, essential for collagen stability.
Direct Incorporation: A few rare amino acids, like selenocysteine, are incorporated directly during protein synthesis via special mechanisms in the genetic code.
Metabolic Intermediates: Others exist freely in cells but are not typically part of proteins. Example: Ornithine and citrulline in the urea cycle for nitrogen waste removal.
Peptides and Proteins: Linking Amino Acids
Amino acids link together via a peptide bond, a specific type of covalent bond.
Peptide Bond Formation: A condensation reaction where the carboxyl group of one amino acid reacts with the amino group of the next, releasing a molecule of water.
Thermodynamically, the reverse reaction (hydrolysis) is favored, but ribosomes (molecular machines) use energy to activate the carboxyl group, making peptide bond formation efficient during protein synthesis.
Naming Conventions:
Dipeptide: Two amino acids linked.
Tripeptide: Three amino acids linked.
Oligopeptide: A short chain (up to about 20 residues).
Polypeptide: Longer chains.
Polypeptide vs. Protein:
Polypeptides generally have molecular weights below about 10,000.
Proteins are typically larger, often composed of one or more polypeptide chains, and possess a well-defined 3 ext{D} structure.
Directionality of Chains:
N-terminus (amino terminus): The end with a free amino group.
C-terminus (carboxyl terminus): The end with a free carboxyl group.
By convention, sequences are read from N-terminus to C-terminus.
Protein Size: Can vary enormously:
Human cytochrome c: ext{approx. } 104 residues.
Titin (muscle protein): Almost 27,000 residues.
Multi-subunit Proteins: Many proteins consist of two or more polypeptide chains (subunits), which can be identical or different.
Held together by noncovalent interactions, sometimes by disulfide bonds.
Example: Hemoglobin has four subunits (two alpha, two beta).
Conjugated Proteins: Contain permanently associated non-amino acid chemical components called prosthetic groups, which are essential for the protein's function.
Examples: Metal ions, heme (in hemoglobin for oxygen binding), lipids (in lipoproteins).
Protein Purification: The "Needle in a Haystack" Problem
Purifying a specific protein from a complex mixture (the "ultimate needle in a haystack problem") is a fundamental challenge in biochemistry.
Strategy: Exploit differences in protein physical and chemical properties (size, charge, binding affinity, solubility).
General Steps:
Crude Extract Preparation: Break open cells/tissues to release all proteins, creating a messy mixture.
Fractionation (Initial Separation):
Salting Out: Gradually increase salt concentration (e.g., ammonium sulfate) to selectively precipitate proteins based on their solubility differences. Collect fractions containing the protein of interest.
Dialysis: A semipermeable membrane bag containing the protein solution is placed in a buffer. Small molecules (salts) pass out, while larger proteins remain inside, allowing buffer exchange or salt removal.
Column Chromatography (Powerful Separation Techniques):
Involves a column packed with a solid stationary phase; protein mixture is applied, and a mobile phase (buffer) flows through.
Proteins interact differently with the stationary phase, separating based on their properties.
Types of Chromatography:
Ion-Exchange Chromatography: Stationary phase has charged groups.
Separates proteins based on their charges.
Cation exchange resin: negatively charged, binds positively charged proteins. Negatively/neutrally charged proteins flow through first.
Bound proteins are eluted by changing pH or increasing salt concentration.
Directly uses the pI concept for prediction.
Size-Exclusion Chromatography (Gel Filtration): Stationary phase has porous beads.
Separates proteins based on size and shape.
Counter-intuitive: Larger proteins elute first because they are too big for the pores and take a direct route. Smaller proteins enter the pores, taking a longer, more convoluted path.
Affinity Chromatography: Stationary phase has a covalently attached ligand that specifically and tightly binds to the protein of interest.
Highly specific: Only the target protein binds; others wash through.
Elution: Add high concentration of free ligand to compete for binding, or change conditions (pH, salt) to disrupt interaction.
HPLC (High-Performance Liquid Chromatography): Uses very fine stationary phase materials and high-pressure pumps for enhanced resolution and faster separation times.
Monitoring Purification:
Quantitative Tracking:
Measure enzyme activity (if applicable, units per time).
Measure total protein amount (mg).
Calculate specific activity (enzyme activity units / mg total protein).
Interpretation: As purification progresses, total activity might decrease slightly (due to losses), but total protein decreases significantly.
Therefore, specific activity should increase with each successful step.
When specific activity reaches a maximum and constant value, the protein is likely pure.
Results are often summarized in a purification table.
Protein Analysis: Electrophoresis
Gel Electrophoresis: Applies an electric field across a gel matrix (polyacrylamide); proteins migrate based on charge and size.
SDS-PAGE (Sodium Dodecyl Sulfate Polyacrylamide Gel Electrophoresis): The most common type.
SDS (a detergent) performs two crucial functions:
Denatures proteins: Unfolds them into linear chains.
Coats with negative charges: Overwhelms the protein's intrinsic charge, making all proteins negatively charged rods.
Separation Mechanism: Proteins migrate towards the positive electrode, with separation based almost entirely on size/molecular weight.
Smaller proteins move faster and further down the gel.
After running, gels are stained to visualize protein bands; purity is indicated by a single major band at the expected molecular weight.
Isoelectric Focusing (IEF):
Separates proteins based on their isoelectric point (pI).
A stable pH gradient is established across the gel. Proteins migrate until they reach the point where the pH equals their pI (net charge is zero), at which point they stop moving.
2D Gel Electrophoresis (Two-Dimensional Electrophoresis): Combines IEF and SDS-PAGE for high-resolution separation.
First Dimension: Proteins separated by pI using IEF (e.g., in a thin tube gel).
Second Dimension: The IEF gel is placed sideways on an SDS-PAGE slab gel, and proteins are separated by size.
Result: Thousands of different proteins from a complex mixture (e.g., a cell's proteome) can be resolved on a single gel, providing a snapshot of the entire protein complement.
Primary Structure: The Blueprint of Life
Primary Structure: The linear sequence of amino acid residues, read from the N-terminus to the C-terminus, including the locations of any disulfide bonds.
Central Dogma of Protein Structure: The primary structure (amino acid sequence) largely dictates how the protein folds into its specific functional 3 ext{D} shape.
Sequence determines fold, and fold determines function.
Significance: Changes in sequence often alter the fold and thus the function.
Disease: Many genetic diseases (e.g., sickle cell anemia, Duchenne muscular dystrophy) result from even a single amino acid substitution or deletion in a critical protein.
Evolutionary Conservation: Functionally important regions of proteins tend to be highly conserved across species during evolution, reflecting shared ancestry and functional constraints.
Protein Sequencing
Historical Context: Frederick Sanger first sequenced a protein (insulin) in 1953, proving proteins have defined sequences (Nobel Prize).
Modern Approaches:
Indirect Sequencing (Genomics): Most commonly, protein sequences are deduced indirectly by sequencing the encoding gene (DNA) and translating it using the genetic code.
Direct Protein Sequencing (Mass Spectrometry): Still crucial for identifying post-translational modifications or verifying predicted sequences. Mass spectrometry has revolutionized proteomics.
Tandem MS (MS/MS):
Protein is broken into smaller peptides (e.g., using trypsin).
In the mass spectrometer, a specific peptide ion is selected based on its mass.
The selected ion is sent to a collision cell, where it fragments (typically at peptide bonds).
A second stage of mass analysis measures the masses of these fragments.
By analyzing the mass differences between successive fragment ions, the amino acid sequence of the original peptide can be pieced together like a chemical barcode.
LC-MS/MS: Coupling tandem MS with liquid chromatography allows analysis of thousands of peptides from complex mixtures, enabling identification, sequencing, quantification, and modification detection of entire proteomes in a single experiment.
Peptide Synthesis
Chemical synthesis of peptides is a vital tool, especially for smaller peptides (up to about 100 residues), used for drugs, antigens, and research.
Solid-Phase Peptide Synthesis (Merrifield): A major breakthrough (Nobel Prize to R. Bruce Merrifield).
Peptide chain is built step-by-step while chemically attached to an insoluble solid support (e.g., plastic beads).
Amino acids (one at a time, protected) are added, excess reagents washed away, end is deprotected, then the next amino acid is added.
Limitations: Even with high efficiency (e.g., 99 ext{%} per step), making long peptides (e.g., 100 residues) chemically results in a significant fraction of incorrect sequences due to missed steps.
This highlights the astonishing speed and accuracy of biological protein synthesis within cells (e.g., a bacterium making a 100-residue protein perfectly in about 5 seconds).
Protein Sequences and Evolution
Protein sequences serve as "molecular fossils," containing vast information about evolutionary history.
Evolutionary Relatedness:
Closely related organisms (recent common ancestor) have very similar amino acid sequences for functionally equivalent proteins.
As evolutionary distance increases, differences in corresponding protein sequences tend to increase.
Conserved vs. Variable Residues:
Conserved residues: Critical for protein function, change rarely over time.
Variable positions: Less critical, tolerate substitutions more readily, accumulating changes over time.
Homologs: Related proteins derived from a common ancestral gene.
Paralogs: Homologous proteins within the same species (often from gene duplication).
Orthologs: Homologous proteins in different species (evolved from a common ancestral gene after a speciation event); used to build evolutionary trees.
Sequence Alignment: Computer algorithms align sequences, introducing gaps to maximize identity, using scoring systems that reward matches and penalize mismatches/gaps.
Alignment reveals relationships; similarity reflects evolutionary relatedness.
Signature sequences: Short, characteristic amino acid stretches unique to specific organism groups, providing strong evidence for evolutionary links (e.g., a 12-amino acid insertion in EF-1 found in all Archaea and Eukaryotes but not Bacteria, suggesting a closer link between Archaea and Eukaryotes).
Phylogenetic Trees: By comparing sequences of multiple proteins (especially universal ones like ribosomal proteins), researchers construct detailed evolutionary (phylogenetic) trees, mapping the history and connections between species. This is an ongoing project to reconstruct the entire tree of life.
Concluding Thoughts
Understanding protein structure, function, study methods, and evolution is foundational to biochemistry, molecular biology, and all life sciences.
The ability to purify, analyze, sequence, and synthesize these molecules has transformed our understanding of life's core mechanisms.
The exclusive use of L-amino acids in nearly all proteins implies a non-random, specific chiral selection very early in the origin of life. Pondering the implications if D-amino acids had been chosen instead highlights the fundamental impact of this single choice on biology and biochemistry.