Molecular Biology of the Cell: Comprehensive Study Notes on Protein Structure, Function, and Methods

Fundamentals of Protein Structure and Molecular Components

Proteins are composed of long, unbranched chains of 2020 different amino acids, specifically termed polypeptides. Every protein features a repeating sequence of atoms along its core, known as the polypeptide backbone, while the specific shape and structure of the protein are dictated by its unique amino acid sequence. The folding of these chains is influenced by steric requirements and stabilized by various noncovalent interactions, including hydrogen bonds, electrostatic attractions, and Van der Waals attractions.

The general chemical formula for an amino acid consists of an amino group (H2NH_2N), a carboxyl group (COOHCOOH), and a side-chain group (RR) all attached to a central α\alpha-carbon atom (CalphaC_alpha). At a neutral pHpH of 77, both the amino and carboxyl groups are ionized, appearing as H3N+H_3N^+ and COOCOO^-. Because the α\alpha-carbon is asymmetric, it allows for two mirror-image stereoisomers, known as the LL and DD forms. Crucially, proteins in biological systems consist exclusively of LL-amino acids.

Amino acids are linked together by amide linkages called peptide bonds. The formation of a peptide bond involves a dehydration reaction where water (H2OH_2O) is removed. The four atoms involved in the peptide bond (the carbonyl carbon, carbonyl oxygen, amide nitrogen, and amide hydrogen) form a rigid, planar unit, meaning there is no rotation around the CNC-N bond. However, the single bonds connecting these planar units to the α\alpha-carbon allow for rotation, providing the long polypeptide chains with significant flexibility. By convention, protein sequences are written starting with the amino-terminus (NN-terminus) on the left and ending with the carboxyl-terminus (CC-terminus) on the right.

The Twenty Amino Acids and Their Chemical Families

The 2020 common amino acids are categorized into four families based on the chemical properties of their side chains: acidic, basic, uncharged polar, and nonpolar. Each amino acid is identified by a three-letter and a one-letter abbreviation.

The polar amino acids with negative charges (acidic) are Aspartic acid (AspAsp, DD) and Glutamic acid (GluGlu, EE). Those with positive charges (basic) are Arginine (ArgArg, RR), Lysine (LysLys, KK), and Histidine (HisHis, HH). Uncharged polar amino acids include Asparagine (AsnAsn, NN), Glutamine (GlnGln, QQ), Serine (SerSer, SS), Threonine (ThrThr, TT), and Tyrosine (TyrTyr, YY).

The nonpolar amino acids include Alanine (AlaAla, AA), Glycine (GlyGly, GG), Valine (ValVal, VV), Leucine (LeuLeu, LL), Isoleucine (IleIle, II), Proline (PP, ProPro), Phenylalanine (PhePhe, FF), Methionine (MetMet, MM), Tryptophan (TrpTrp, WW), and Cysteine (CysCys, CC). While most side chains are chemically stable, Cysteine can form covalent disulfide bonds.

Protein Folding and Secondary Structure Patterns

A protein generally folds into a 3D conformation that minimizes its free energy. This process is often assisted by molecular chaperones, which ensure correct folding pathways are followed. In an aqueous environment, proteins typically form a compact conformation where a hydrophobic core region containing nonpolar side chains is shielded from water, while polar side chains remain on the outside to form hydrogen bonds with the surroundings.

Steric limitations on bond angles, specifically the phi (ϕ\phi) and psi (ψ\psi) angles, restrict the possible conformations of the polypeptide chain. Two common regular folding patterns result from hydrogen bonding between the NHN-H and C=OC=O groups of the polypeptide backbone: the α\alpha-helix and the β\beta-sheet. In an α\alpha-helix, the polypeptide twists around itself with a hydrogen bond formed every fourth peptide bond, resulting in a structure with a turn every 0.54nm0.54\,nm. Sometimes, two α\alpha-helices wrap around each other to form a coiled-coil, a structure that minimizes the exposure of hydrophobic side chains to the aqueous environment.

β\beta-sheets are formed by hydrogen bonding between neighboring polypeptide chains. These can be arranged as parallel chains (running in the same direction) or antiparallel chains (running in opposite directions). The typical distance between the repeating units in a β\beta-sheet is 0.7nm0.7\,nm. These patterns are fundamental to building the structural domains of larger proteins.

The Hierarchical Organization of Protein Structure

Protein structure is described at four distinct levels. The primary structure is the linear amino acid sequence itself. The secondary structure refers to local folding patterns such as α\alpha-helices and β\beta-sheets. The tertiary structure is the full three-dimensional organization of a single polypeptide chain. If a molecule is composed of multiple polypeptide chains, the complete structure is referred to as the quaternary structure.

A protein domain is a modular unit of structure—a contiguous part of a polypeptide that can fold independently of the rest of the protein. Large proteins have often evolved through "domain shuffling," where pre-existing domains such as Immunoglobulin, Fibronectin, and Kringle modules are joined together. The structure of these domains is often more highly conserved through evolution than the specific amino acid sequence, as seen in the similarities between serine proteases like elastase and chymotrypsin.

Protein Assemblies and Specialized Polypeptide Chains

Proteins often bind to each other using weak noncovalent bonds to form larger structures. A protein subunit is a single polypeptide chain within a larger complex. Assemblies can be homo-oligomers (identical subunits, e.g., the Cro repressor dimer) or hetero-oligomers (different subunits, e.g., hemoglobin, which contains two α\alpha-globin and two β\beta-globin chains). Some globular proteins assemble into long helical filaments, such as actin filaments, while others form elongated fibrous shapes like collagen or keratin. Collagen molecules often form a triple helix (300nm×1.5nm300\,nm \times 1.5\,nm) to provide structural support, while elastin molecules form cross-linked elastic fibers that can stretch and relax.

Many proteins also contain intrinsically disordered polypeptide chains. These regions lack a fixed 3D structure and perform essential functions such as binding, signaling, tethering (linking domains), or acting as a diffusion barrier. Furthermore, extracellular proteins are often stabilized by covalent cross-linkages, most notably disulfide bonds (SSS-S bonds) formed between cysteine side chains through oxidation.

Large biological structures such as viral capsids (e.g., Tobacco Mosaic Virus or spherical viruses) are built from small repeating subunits. This strategy is efficient because it requires minimal genetic information, allows for easy control of assembly and disassembly, and facilitates error correction by excluding malformed subunits. Some complex assemblies require assembly factors like templates or proteolytic cleavage (as seen in insulin assembly) to achieve the final form.

Principles of Protein Function and Binding

The function of a protein is determined by its ability to bind specifically to other molecules, known as ligands. This specificity depends on the precise matching of the protein's binding site surface with the ligand via multiple noncovalent bonds. Antibodies, or immunoglobulins, are exceptionally versatile examples of binding proteins; they feature constant domains and highly variable loops (VLV_L and VHV_H) that allow for the recognition of a vast array of antigens.

Binding strength is measured by the equilibrium constant (KK). A larger KK indicates stronger binding and is a direct measure of the free-energy difference between the bound and free states. Interaction interfaces can include surface-string interactions, coiled-coils, or the matching of two rigid surfaces. Evolutionary tracing methods can identify crucial binding sites by highlighting conserved amino acids across different protein family members.

Enzyme Catalysis and Kinetics

Enzymes are specialized proteins that act as powerful and highly specific catalysts. They bind to substrates and convert them into chemically modified products without being consumed in the reaction. Common enzyme classes include Hydrolases (cleavage of bonds), Nucleases (breakdown of nucleic acids), Proteases (breakdown of proteins), Synthases (anabolic condensation), Ligases (energy-dependent joining), Isomerases (bond rearrangement), Polymerases (polymerization), Kinases (addition of phosphate), Phosphatases (removal of phosphate), Oxido-Reductases (redox reactions), ATPases (ATP hydrolysis), and GTPases (GTP hydrolysis).

Enzymes accelerate reactions by decreasing the activation energy required to reach the transition state. Strategies for catalysis include orienting substrates precisely, rearranging electrons to create partial charges, and straining bonds toward the transition state. The kinetics of an enzyme-catalyzed reaction are characterized by VmaxV_{max} (the maximum rate when the enzyme is saturated) and the turnover number (the maximum rate divided by the enzyme concentration). The efficiency of many enzymes is limited only by the frequency of collision with substrates, known as the diffusion-limited rate (10810^8 to 109mol1s110^9\,mol^{-1}s^{-1}, though many metabolites are present in μM\mu M concentrations).

Regulation of Protein Activity and Signaling

Cells regulate enzyme activity through several mechanisms: gene expression levels, subcellular compartmentalization, and post-translational modifications. Negative regulation, such as feedback inhibition, occurs when a late product in a metabolic pathway inhibits an enzyme early in the pathway. Positive regulation involves a molecule stimulating an enzyme's activity.

Many proteins are allosteric, meaning they can adopt multiple conformations. The binding of a ligand at one site (the regulatory or allosteric site) causes a conformational change that affects the binding at a second site (the active site). This reciprocal effect is known as the linkage principle. Proteins are also regulated by covalent modifications, most notably protein phosphorylation. Kinases transfer a phosphate group from ATPATP to the hydroxyl groups of Serine, Threonine, or Tyrosine, while phosphatases catalyze the reverse. This can trigger conformational changes, create new binding sites (e.g., for SH2SH2 domains), or mask existing ones.

The Src kinase is a notable example of a signal-integrating device; its activity is controlled by two inputs: the removal of a C-terminal phosphate and the binding of its SH3SH3 domain to an activating protein. Another ubiquitous regulator is the GTPGTP-binding protein (GTPase), which acts as a molecular switch. It is active when bound to GTPGTP and becomes inactive after hydrolyzing it to GDPGDP. This cycle is controlled by GAPGAP (GTPase-activating proteins) and GEFGEF (Guanine nucleotide exchange factors).

The Ubiquitin-Proteasome System and Proteomics

Proteins can be marked for degradation or localization by the covalent addition of other small proteins such as Ubiquitin or SUMO (Small Ubiquitin-related Modifier). Polyubiquitin chains (linked at K48K48 or K63K63) are created by an enzyme system: E1E1 (Ubiquitin-activating), E2E2 (Ubiquitin-conjugating), and E3E3 (Ubiquitin ligase). The SCF ubiquitin ligase is a complex with interchangeable parts that binds different target proteins (degrons) at specific times in the cell cycle.

Proteomics is the large-scale analysis of protein sets to understand cell function. Protein interaction maps help identify the function of uncharacterized proteins by showing which other molecules they interact with. Cross-species comparisons of these maps can reveal conserved biological pathways.

Laboratory Techniques for Studying Cells and Proteins

Cellular analysis often requires isolating and growing cells in culture. Primary cultures are taken directly from tissue, while secondary cultures have been passaged. Heterogeneous cell populations can be sorted using a Fluorescence-Activated Cell Sorter (FACS). Hybridoma cell lines are created by fusing a BB lymphocyte with a transformed cell to mass-produce monoclonal antibodies.

Protein purification involves cell fractionation via preparative ultracentrifugation. Components are separated by size and density using techniques like velocity sedimentation (sucrose gradient) or equilibrium sedimentation. Chromatography provides further fractionation based on charge (ion-exchange), hydrophobicity, size (gel filtration), or specific binding (affinity chromatography). Immunoprecipitation is a rapid affinity method using antibodies. Epitope tagging (e.g., His-tag, GST-tag) is a genetic engineering approach to simplify purification.

For analysis, SDS-polyacrylamide-gel electrophoresis (SDS-PAGE) separates proteins by molecular weight by unfolding them with the detergent SDSSDS and reducing disulfide bonds with β\beta-mercaptoethanol. Two-dimensional gel electrophoresis combines isoelectric focusing (IEF), which separates by charge in a pHpH gradient, and SDS-PAGE, which separates by size. Proteins can then be identified using Western blotting or Mass Spectrometry (e.g., MALDI-TOF), which measures the mass-to-charge ratio of ionized peptides.

Optical Methods and Structural Biology

Protein interactions can be monitored in real-time using optical methods like fluorescence anisotropy and Fluorescence Resonance Energy Transfer (FRET), which detects when two proteins are within 110nm1-10\,nm of each other. Chemical biology uses small molecule inhibitors, such as monastrol (a kinesin inhibitor), to disrupt specific protein functions. Structural determination at the atomic level is achieved through X-ray crystallography (using diffraction patterns from crystals) or NMR spectroscopy (using interactions between nearby hydrogen atoms).

Bioinformatics tools like BLAST allow for sequence alignment to predict protein function based on similarity to known genes. The significance of an alignment is measured by the Score (penalties for gaps/substitutions) and the Expectation (EE) value (probability of a match occurring by chance).

Microscopy and Visualization Techniques

The light microscope has a resolution limit of 0.2μm0.2\,\mu m, determined by the wavelength of radiation and the numerical aperture of the lens. Contrast can be enhanced in living, unstained cells using phase-contrast or differential-interference-contrast (DIC) microscopy. For tissue samples, microtomes are used to create thin sections (110μm1-10\,\mu m) which are then stained with dyes like hematoxylin and eosin.

Fluorescence microscopy uses specific filters to excite fluorophores and detect their emitted light. Modern variations include confocal microscopy, which uses pinholes to exclude out-of-focus light and produce sharp optical sections, and multiphoton microscopy. Dynamics of proteins in living cells can be studied with GFP (Green Fluorescent Protein) tagging, Photoactivation, and Fluorescence Recovery After Photobleaching (FRAP).

Super-resolution techniques now overcome the diffraction limit, including Structured Illumination Microscopy (SIM), Stimulated Emission Depletion (STED), and PALM/STORM, which can achieve resolutions near 20nm20\,nm by pinpointing individual molecules. Single-molecule techniques like Total Internal Reflection Fluorescence (TIRF) and Atomic Force Microscopy (AFM) allow for the visualization and manipulation of individual proteins.

Electron Microscopy and High-Resolution Imaging

The Transmission Electron Microscope (TEM) offers the highest resolution (0.14nm0.14\,nm) by using electron beams instead of light. Samples must be fixed, embedded in resin, and cut into extremely thin sections. Techniques like EM tomography and single-particle reconstruction can build 3D models of complex structures like the HIV capsid. The Scanning Electron Microscope (SEM) is used to view the surface of specimens, providing striking 3D images with a resolution between 3nm3\,nm and 20nm20\,nm. Cryoelectron microscopy preserves structures in their native, frozen state without chemical fixatives.