The Three-Dimensional Structure of Proteins
The Three-Dimensional Structure of Proteins
Learning Goals (Chapter Overview)
Structure and properties of the peptide bond.
Structural hierarchy in proteins.
Structure and activity of fibrous proteins.
Structure analysis of globular proteins.
Protein folding and denaturation.
Fundamental Principles of Protein Structure
Specific Conformation: Unlike most organic polymers, protein molecules adopt a specific three-dimensional conformation called the native fold.
Functionality: This specific structure is essential for fulfilling a specific biological function or activity. Changing the structure invariably changes the function.
Favorable Interactions: The native fold is stabilized by a large number of favorable interactions within the protein, though protein folding is not "free." There is an entropy cost to folding a disordered molecule into a specific, ordered native fold.
Favorable Interactions in Proteins
Hydrophobic Effect:
The primary driving force for protein folding.
Involves the release of water molecules from the structured solvation layer around nonpolar molecules as the protein folds.
This release increases the net entropy of water solvent, making the folding process thermodynamically favorable.
Hydrogen Bonds:
Interactions between the N−H (amide proton) and C=O (carbonyl oxygen) of the peptide bond backbone.
Crucial for leading to local regular structures such as alpha (\alpha) helices and beta (\beta) sheets.
London Dispersion (van der Waals) Forces:
Medium-range weak attractions between all atoms.
Contribute significantly to the stability, especially in the densely packed interior of the protein.
Electrostatic Interactions:
Long-range strong interactions between permanently charged groups (e.g., charged amino acid side chains).
Salt bridges, particularly those buried within a hydrophobic environment, strongly stabilize the protein structure.
Four Levels of Protein Structure
Primary Structure:
The linear sequence of amino acids linked together by peptide bonds.
Includes any disulfide bonds (covalent linkages between cysteine residues).
This sequence is the fundamental determinant of the higher-order structures.
Secondary Structure:
Local spatial arrangements of the polypeptide backbone.
Stabilized by hydrogen bonds between backbone atoms of nearby residues.
Common elements include the \alpha helix and the \beta sheet.
Tertiary Structure:
The overall three-dimensional spatial arrangement of all atoms in a single polypeptide chain.
Results from interactions between the R groups (side chains) of amino acids.
Interacting amino acids are not necessarily adjacent in the primary sequence.
Quaternary Structure:
Formed by the assembly of multiple individual polypeptide subunits (each with its own tertiary structure) into a larger functional protein complex.
Example: Hemoglobin, composed of four subunits.
Primary Structure: The Peptide Bond
Properties: The specific structure of the protein is partly dictated by the properties of the peptide bond.
Resonance Hybrid: The peptide bond is a resonance hybrid of two canonical structures, giving it partial double-bond character (approximately 40\%). This resonance is depicted as delocalization of electrons between the carbonyl oxygen, carbonyl carbon, and amide nitrogen.
Consequences of Resonance:
Reduced Reactivity: Less reactive compared with esters, for example.
Rigidity and Planarity: The peptide bond is quite rigid and nearly planar. This means the carbonyl carbon, peptide nitrogen, and the two \alpha carbons are held in a rigid plane.
Dipole Moment: Exhibits a large dipole moment in the favored trans configuration.
Restricted Rotation: A crucial implication is that rotation around the peptide bond is not permitted due to its partial double-bond character. This rigidity contributes significantly to the stability of protein structure.
Rotations in the Polypeptide Backbone
While rotation around the peptide bond itself is restricted, rotation is permitted around the bonds connected to the \alpha carbon:
\phi (phi): The dihedral angle around the \alpha carbon—amide nitrogen bond.
\psi (psi): The dihedral angle around the \alpha carbon—carbonyl carbon bond.
In a fully extended polypeptide chain, both \phi and \&psi are approximately 180^{\circ}.
The specific combinations of \phi and \&psi angles, combined with the identity and steric hindrance of the R groups, determine the secondary structure of the protein.
Distribution of \phi and \&psi Dihedral Angles (Ramachandran Plot)
Not all \phi and \&psi combinations are sterically possible due to crowding of backbone atoms with other backbone atoms or side chains.
Some combinations are more favorable as they allow for beneficial hydrogen-bonding interactions along the backbone.
A Ramachandran plot is a graphical representation that shows:
The distribution of experimentally observed \phi and \&psi dihedral angles in protein structures.
The allowed regions of conformational space, revealing common secondary structure elements (like \alpha helices and \&beta sheets).
Regions that contain unusual backbone structures.
Glycine residues often fall outside the expected ranges due to their small, flexible R-group (hydrogen atom).
Left-handed \alpha helices, while theoretically possible for a few residues, have not been observed in proteins over extended segments.
Secondary Structures
Refer to local spatial arrangements of the polypeptide backbone.
Two major regular arrangements:
The \alpha Helix:
A right-handed helical structure.
Stabilized by hydrogen bonds between the backbone amide proton (N-H) of residue n and the backbone carbonyl oxygen (C=O) of residue n+4.
Contains 3.6 amino acid residues per turn, spanning 5.4 \, \text{Å} along the helical axis.
Peptide bonds are aligned roughly parallel with the helical axis.
Side chains (R groups) point outward, roughly perpendicular to the helical axis.
Dimensions:
Inner diameter (excluding side chains): approximately 4\text{–}5 \, \text{Å}. This space is too small for anything to fit inside.
Outer diameter (including side chains): approximately 10\text{–}12 \, \text{Å}. This size conveniently fits well into the major groove of double-stranded DNA.
Amino acids #1 and #8 align nicely on top of each other in a top-down view.
Sequence Affects Stability:
Small hydrophobic residues like Alanine (Ala) and Leucine (Leu) are strong helix formers.
Proline (Pro) acts as a helix breaker because its cyclic structure restricts rotation around the N-C\alpha (\phi-angle) bond, making it incompatible with the helical geometry.
Glycine (Gly) acts as a helix breaker because its tiny R group (hydrogen) provides too much flexibility, allowing for many other conformations and making it less suitable for the rigid \alpha helical structure.
Attractive or repulsive interactions between side chains that are 3 to 4 amino acids apart can significantly affect helix formation and stability.
Helix Dipole:
Each peptide bond has a strong dipole moment (C=O is partially negative, N-H is partially positive).
In an \alpha helix, all peptide bonds are oriented in the same direction.
This creates a large macroscopic dipole moment for the entire helix, with the N-terminus having a partial positive charge and the C-terminus having a partial negative charge.
Negatively charged amino acid residues are often found near the N-terminal (positive) end of the helix dipole, providing electrostatic stabilization.
The \beta Sheet (or \&beta-Pleated Sheet):
Characterized by a pleated, sheet-like structure due to the planarity of the peptide bond and the tetrahedral geometry of the \alpha carbon.
Stabilized by hydrogen bonds between the backbone amide and carbonyl groups of the peptide bonds in different, adjacent segments (strands) of the polypeptide chain. These segments may not be close in primary sequence.
Side chains protrude from the sheet, alternating in an up-and-down direction.
Parallel and Antiparallel Arrangements:
Antiparallel \&beta sheets: Adjacent strands run in opposite N-to-C terminal directions. The hydrogen bonds between strands are linear and therefore stronger.
Parallel \&beta sheets: Adjacent strands run in the same N-to-C terminal direction. The hydrogen bonds between strands are bent and therefore weaker.
\beta Turns (Reverse Turns):
Occur frequently, allowing polypeptide chains to make abrupt 180^{\circ} changes in direction, effectively connecting adjacent antiparallel \&beta strands.
Typically accomplished over four amino acid residues.
Stabilized by a hydrogen bond between the carbonyl oxygen of the first residue and the amide proton of the fourth residue (i.e., three residues down the sequence).
Proline in position 2 and Glycine in position 3 are common in \&beta turns due to their conformational properties.
Type I turns are more than twice as frequent as Type II turns. Type II turns often have Glycine at the third position.
Random Coil:
Describes irregular or undefined arrangements of the polypeptide chain that do not conform to regular secondary structures like \alpha helices or \&beta sheets.
Proline Isomers
Most peptide bonds not involving Proline exist in the trans configuration (>99.95\%).
For peptide bonds involving Proline, approximately 6\% can exist in the cis configuration, significantly higher than for other amino acids. Many of these cis Proline bonds are found in \&beta turns.
Proline isomerization can be catalyzed by specific enzymes called proline isomerases.
Protein Tertiary Structure
Refers to the overall spatial arrangement of all atoms in a protein's single polypeptide chain.
Stabilized primarily by numerous weak interactions between amino acid side chains, including:
Hydrophobic interactions (core).
Polar interactions (surface, solvent-exposed).
Salt bridges.
Hydrogen bonds (side chain-side chain, side chain-backbone).
Can also be stabilized by disulfide bonds (covalent linkages between cysteine residues), which significantly enhance structural stability.
Interacting amino acids are not necessarily close to each other in the primary sequence.
Proteins are broadly classified into two major classes based on tertiary structure and solubility:
Fibrous proteins: Generally elongated, insoluble, structural roles (e.g., keratin, collagen).
Globular proteins: Generally compact, roughly spherical, water- or lipid-soluble, dynamic roles (e.g., enzymes, transporters, myoglobin).
Examples of Fibrous Proteins
\alpha-Keratin (Hair):
An important component of hair, wool, nails, horns, and skin.
Structure consists of elongated \alpha helices.
Two \alpha helices are interwound in a left-handed coiled-coil arrangement.
These two-chain coiled-coils then assemble into higher-order structures: protofilaments, protofibrils, and finally intermediate filaments (composed of approximately 32 \alpha-keratin strands).
The strength and insolubility are largely due to numerous disulfide bonds between adjacent polypeptides.
Permanent Waving Chemistry: Involves reducing the disulfide bonds in keratin (-S-S- \to -SH \text{ HS-}), physically reshaping the hair, and then oxidizing the thiols back into new disulfide bonds (-SH \text{ HS-} \to -S-S-) to maintain the new curl.
Collagen:
A major constituent of connective tissues like tendons, cartilage, bones, and the cornea of the eye.
Each individual collagen chain is a long, Glycine- and Proline-rich left-handed helix.
Collagen Triple Helix: Three of these left-handed collagen chains intertwine to form a characteristic right-handed superhelical triple helix.
This triple helix possesses exceptionally high tensile strength, exceeding that of a steel wire of equal cross section.
Glycine Requirement: Glycine, with its uniquely small side chain, is required at every third position (Gly-X-Pro or Gly-X-Hydroxyproline repeats) to allow the three chains to pack tightly together at the central core of the triple helix.
4-Hydroxyproline (4-Hyp):
A post-translationally modified amino acid unique to collagen.
Incorporation of 4-Hyp forces the proline ring into a favorable pucker, which stiffens the collagen chain.
Crucially, the hydroxyl group of 4-Hyp forms additional hydrogen bonds between the three strands of the triple helix, significantly increasing its stability.
The hydroxylation reaction is catalyzed by prolyl hydroxylase, which requires \alpha-ketoglutarate, molecular oxygen, and ascorbate (Vitamin C).
Scurvy: A deficiency of Vitamin C.
Symptoms: Bleeding gums, tooth loss, bruising, coiled hair, fatigue, and muscle pain.
Cause: Vitamin C is essential to reduce the ferrous iron (Fe^{2+}) in prolyl hydroxylase back to its active state after it is oxidized to ferric iron (Fe^{3+}). Without Vitamin C, the enzyme becomes inactive, leading to insufficient formation of 4-hydroxyproline. This results in the synthesis of unstable collagen with fewer inter-strand hydrogen bonds, leading to weakened connective tissues throughout the body.
Collagen Fibrils: Collagen triple-helices assemble in a staggered fashion and become covalently cross-linked (between Lys, Hydroxylysine, or Histidine residues) to form robust collagen fibrils, which exhibit characteristic striations (e.g., 640 \, \text{Å} apart).
Silk Fibroin (Silk):
The main protein component of silk produced by moths and spiders.
Primarily composed of an antiparallel \&beta sheet structure.
Rich in small amino acid residues like Alanine (Ala) and Glycine (Gly).
The small side chains allow for very close packing and interdigitation of the flat \&beta sheets.
Stabilized by extensive hydrogen bonding within the sheets and strong London dispersion (van der Waals) interactions between the stacked sheets.
Spider Silk: Known for its extreme strength (stronger than steel) and extensibility before breaking. It is a composite material with crystalline (fibroin-rich) and rubber-like stretchy parts.
Water-Soluble Globular Proteins
Typically compact, folded structures with a hydrophobic core and a hydrophilic surface, allowing them to be soluble in aqueous environments.
Often depicted using ribbon representations (highlighting secondary structures) or surface contour images (to visualize binding pockets).
Examples: Myoglobin, human serum albumin.
Motifs (Folds) and Domains
Motif (Fold): A specific, recognizable arrangement of several secondary structure elements (e.g., \&beta-\alpha-\&beta loop, \&beta barrel, \alpha/\&beta barrel).
Motifs are found as recurring structural patterns in numerous functionally diverse proteins.
Domain: A stable, independently folding part of a polypeptide chain, often associated with a specific function. Globular proteins are frequently composed of multiple distinct motifs and domains folded together.
Intrinsically Disordered Proteins (IDPs)
Contain protein segments that natively lack a stable, definable three-dimensional structure under physiological conditions.
Typically enriched in small, charged, or polar amino acids such as Lysine (Lys), Arginine (Arg), Glutamate (Glu), and Proline (Pro).
Their flexibility allows them to conform to and interact with many different partner proteins, often folding into a specific structure only upon binding to a target (e.g., the carboxyl terminus of p53 protein).
This conformational plasticity is important for diverse biological functions, including signaling, regulation, and assembly.
Quaternary Structure
Formed by the assembly of two or more individual polypeptide chains (subunits).
These subunits associate into a larger, functional protein complex (e.g., deoxyhemoglobin, which has four subunits).
Stabilized by the same types of weak interactions as tertiary structure (hydrophobic interactions, hydrogen bonds, salt bridges), and sometimes by disulfide bonds between subunits.
Protein Structural Determination Methods
X-Ray Crystallography:
Steps: Purify the protein, crystallize it, collect X-ray diffraction data, calculate electron density maps, and then fit the known amino acid residues into the electron density to determine the 3D structure.
Pros: Can determine the structure of very large proteins, well-established technique.
Cons: Can be difficult or impossible to crystallize membrane proteins or highly flexible proteins; cannot directly resolve the positions of hydrogen atoms.
Biomolecular NMR (Nuclear Magnetic Resonance) Spectroscopy:
Steps: Purify the protein, dissolve it in solution, collect NMR spectroscopic data, assign NMR signals to specific atoms, and calculate a family of structures consistent with the observed spatial constraints (e.g., NOE signals indicating close proximity of hydrogen atoms).
Pros: Does not require crystallization; can resolve hydrogen atoms; can provide dynamic information about protein flexibility.
Cons: Works best with relatively small proteins (typically less than 30 \, \text{kDa}); difficult for insoluble proteins.
Protein Stability and Folding
Proteostasis: The continuous maintenance of cellular protein activity, accomplished by the coordinated action of many different pathways including protein synthesis, folding (often with assistance from chaperones), and degradation (e.g., via the proteasome or autophagy).
Denaturation: The loss of a protein's specific three-dimensional structure, which typically leads to an accompanying loss of its biological activity.
Denaturing Agents (Causes of Denaturation):
Extreme Heat or Cold: High temperatures increase kinetic energy, causing vibrations that break weak interactions. While extreme heat is a strong denaturant, extreme cold can also denature some proteins by altering water structure and hydrophobic interactions.
pH Extremes: Changes in pH alter the ionization states of amino acid side chains, disrupting salt bridges and hydrogen bonds.
Organic Solvents: Can disrupt hydrophobic interactions and hydrogen bonds, often by displacing water molecules.
Chaotropic Agents: Substances like urea and guanidinium hydrochloride (GdnHCl) disrupt the hydrogen-bonding network of water and directly interact with proteins, breaking weak interactions within the protein structure.
Cooperative Unfolding: The transition from folded to unfolded state is often abrupt, suggesting that protein unfolding is a cooperative process, meaning the disruption of one part of the structure destabilizes others.
Ribonuclease (RNase) Refolding Experiment (Anfinsen's Experiment - 1972 Nobel Prize)
Experiment: Ribonuclease A is a small protein with eight cysteine residues linked by four disulfide bonds.
Addition of urea (chaotropic agent) and 2-mercaptoethanol (reducing agent) fully denatures RNase, breaking all non-covalent interactions and reducing all disulfide bonds.
When both urea and 2-mercaptoethanol are carefully removed, the denatured and reduced RNase spontaneously refolds into its native, active conformation, with the correct disulfide bonds reforming.
Conclusion: The primary amino acid sequence alone contains all the necessary information to dictate the protein's native three-dimensional conformation. This demonstrated that folding is an intrinsic property of the polypeptide chain.
Protein Folding Pathways and Levinthal's Paradox
Speed of Folding: Proteins fold very rapidly, typically within microseconds to seconds.
Levinthal's Paradox: It is mathematically impossible for proteins to find their lowest-energy native fold by randomly trying every possible conformation. If a protein of 100 residues could sample a new conformation every 10^{-13}s, it would still take 10^{95} years to sample all conformations, far longer than the age of the universe. This implies folding is not random.
Directed Search: The search for the minimum free-energy structure is not random. Instead, it is a directed process where the polypeptide quickly moves towards the native structure because the direction towards it is thermodynamically most favorable.
Free-Energy Funnel Model:
Protein folding can be visualized as a free-energy funnel.
The depth of the funnel represents the free energy (\Delta G), with the native structure (N) at the bottom (the lowest free-energy point).
As a protein folds, the conformational space that can be explored is progressively constrained.
Folding intermediates that possess significant stability are represented as local free-energy minima (depressions) on the funnel's surface.
Different funnel shapes illustrate various folding scenarios, from multiple pathways leading to a single stable outcome to highly cooperative folding with few intermediates.
Molecular Chaperones and Chaperonins
Chaperones:
Proteins that assist in the proper folding of other proteins.
They typically act by preventing the misfolding and aggregation of unfolded or partially folded peptides, rather than actively dictating the final fold.
Often members of heat shock protein families (e.g., E. coli DnaK/DnaJ, eukaryotic Hsp70/Hsp40).
They bind to hydrophobic regions of unfolded proteins, stabilizing them and allowing them to proceed correctly through their folding pathways.
Chaperonins:
Specialized chaperone systems that form large, cage-like structures (e.g., E. coli GroEL/GroES, eukaryotic Hsp60).
They facilitate protein folding by providing a protected, isolated environment within their central cavity.
An unfolded protein enters one chamber (the cis chamber) of GroEL, which is then capped by GroES. ATP hydrolysis then triggers conformational changes that promote proper folding within this environment.
After folding (and ATP hydrolysis), GroES dissociates, releasing the folded protein. If not folded correctly, the protein can be rebound for another attempt or directed to degradation pathways.
Protein Misfolding and Human Diseases
Misfolded proteins are a basis for numerous human diseases, often by aggregating into harmful structures.
Amyloid Fibrils:
Many misfolding diseases involve the formation of amyloid fibrils, which are characterized by a high content of \&beta sheet structure, where the \&beta strands are arranged perpendicular to the axis of the fibril.
Mechanism: Normally soluble proteins undergo partial unfolding, exposing hydrophobic or \&beta-sheet prone regions.
These exposed regions then aberrantly associate with similar regions in other polypeptide chains, forming amyloid nuclei.
Additional protein molecules slowly add to these nuclei, extending them into insoluble, ordered amyloid fibrils.
Example: In Alzheimer's disease, the native \&alpha-helical amyloid-\&beta (A\&beta) peptide misfolds, loses its helical structure, and forms highly stable \&beta sheet-rich amyloid fibrils. These fibrils aggregate into dense plaques on the exterior of nervous tissue, contributing to neurodegeneration.
Cellular Response to Misfolding: In mammalian cells, misfolded proteins trigger several responses (part of proteostasis):
Binding by heat shock proteins (e.g., Hsp70) to promote refolding.
Activation of the unfolded protein response (UPR).
Degradation by the proteasome (for individual misfolded proteins).
Aggregates are degraded by autophagy.