Notes on Protein Function, Purification, and AI-Based Prediction

Genomes, transcriptomes, and the proteome

  • Context: This module sits between protein–ligand binding (hemoglobin) and the start of enzyme function; it sets up core biochemical topics essential for signal transduction, metabolism, and later translation.
  • Key idea: Proteins are central to biology because they execute the functions encoded by the genome; the genome is relatively fixed, but the proteome is highly dynamic.
  • Genomes (parts list) vs. proteomes (expressed proteins):
    • The genome contains information in DNA; generally fixed across cells, with some changes discussed later.
    • Model organisms and gene counts mentioned:
    • Caenorhabditis elegans (worm): ~19,000 genes.
    • Drosophila melanogaster (fruit fly): ~14,000 genes.
    • Humans have ~21–22k genes (roughly a few thousand more than C. elegans) but many more bases in the genome.
  • Central dogma recap (DNA → RNA → protein):
    • DNA carries information (genome).
    • RNA (transcriptome) is the working copy; variable between cell types and conditions.
    • Protein synthesis (proteome) through translation; proteins are the functional units.
    • Exceptions noted: RNA can function as a ribozyme or carry information; but the general flow is DNA → RNA → protein.
  • Proteome as context-dependent: the set of expressed/modified proteins varies with:
    • Cell type (e.g., kidney vs. brain).
    • Developmental stage.
    • Environmental conditions (e.g., oxygen, nutrients, day vs. night).
  • Example: fetal hemoglobin vs. adult hemoglobin
    • Fetal hemoglobin is downregulated after birth and replaced by beta chain hemoglobin.
    • Re-expression of fetal hemoglobin can alleviate some sickle cell symptoms; illustrates dynamic proteome regulation.
  • Why study protein function?
    • Proteins are the functional executors of life; understanding their presence and function reveals how cells/tissues/organisms work.
  • Approaches to protein function
    • Reductionist approach: isolate a specific protein (e.g., hemoglobin) and study its mechanism in isolation, then place it in a cellular/organismal context.
    • Systems/proteomics approach: study how the entire proteome changes under a condition, then infer functional implications from those changes.
    • In this course, emphasis starts with reductionist biochemistry (protein purification) with later expansion into systems-level proteomics.
  • Environments for studying proteins
    • In vivo: within the native environment (cell/tissue); generally physiologically relevant but high background noise due to many proteins and interactions.
    • In vitro: in glass (test tubes); controlled environment, often purified protein, lower background, easier to measure kinetics and structure, but less physiological relevance.
    • In silico: computational models and simulations; no background noise, can simulate entire cells or single proteins; useful for prediction and screening.
    • Ambiguity between in vivo and in vitro: context matters; tissue culture can be debated as in vivo or in vitro depending on perspective.
  • Nobel-level context: AI-based protein prediction (e.g., AlphaFold) is transforming predictions of structure and function, enabling rapid candidate identification for experimental validation.

Why purification is essential in biochemistry

  • Purification vs. the end goal
    • Purification is a means to an end: once a protein is purified, you can conduct assays to measure function, kinetics, binding, and interactions without background noise.
    • Purification goals can include:
    • Obtaining a protein in its functional form (active) and free of contaminants.
    • Isolating a specific isoform or a protein with a particular post-translational modification.
    • Preserving interactions to study the interactome or co-purifying partners.
  • Sources for recombinant expression
    • Possible sources for protein purification include:
    • Endogenous/native sources (from the organism or tissue).
    • Recombinant expression in systems such as E. coli, yeast, or mammalian cell lines.
    • Choice depends on:
    • Post-translational modifications (which E. coli may lack for some proteins).
    • Yield and cost considerations.
    • Whether the N-terminus or C-terminus tags might interfere with function.
  • Purification pipeline: core ideas
    • Start with cell disruption (lysis) and then separate components by centrifugation to isolate the soluble protein fraction (lysate) from membranes and other debris.
    • Purification strategies rely on exploiting physicochemical properties to separate proteins from contaminants:
    • Size (molecular exclusion/gel filtration).
    • Charge (ion exchange chromatography).
    • Solubility (salting in/out).
    • Specific interactions (affinity purification using tags or ligands).
    • Purification is often done in sequence (e.g., size exclusion followed by ion exchange) because no single method is perfectly selective.
    • A practical note: the goal is to maximize yield of the protein of interest while minimizing contaminants and preserving activity.

Purification techniques and concepts

  • Solubility and salt effects
    • Solubility can be manipulated via salt concentration (salting in vs. salting out):
    • High salt can “salt out” poorly soluble proteins, causing precipitation.
    • Salt ions compete with proteins for water, altering solubility and promoting separation.
    • The Dutch saying referenced: chemistry as the art of separation; emphasizes purification as the essence of chemical analysis.
  • Size-exclusion chromatography (gel filtration)
    • Column with porous beads; proteins separate by size:
    • Large proteins bypass pores and elute first (shorter path).
    • Small proteins enter pores and elute later (longer path).
    • Practical use: collect fractions corresponding to the size of the protein of interest.
  • Ion-exchange chromatography
    • Exploits protein charge at a given pH; two main types:
    • Cation exchange (positive proteins bind to negatively charged beads).
    • Anion exchange (negative proteins bind to positively charged beads).
    • Mechanism: bound proteins are eluted by increasing salt concentration (competition with salt ions for binding sites).
    • Key variables:
    • Choice of resin (negative beads for cation exchange, positive beads for anion exchange).
    • pH control: pH relative to the protein’s pI determines the net charge.
  • Isoelectric point (pI) and pKa concepts
    • pKa: pH at which a functional group is 50% protonated.
    • pI (isoelectric point): pH at which the molecule has net zero charge.
    • At pH below pI, protein tends to be positively charged; at pH above pI, negatively charged.
    • Example discussed: a protein with pI around 10.6 would be positively charged at pH 9–10 and neutral only near pH 10.6; at higher pH, negative.
    • Charge patches: localized clusters (e.g., lysine- and arginine-rich patches) can alter behavior on ion-exchange columns despite the overall charge.
  • Affinity purification and tags
    • Proteins can be purified using affinity tags that bind specifically to a ligand:
    • His-tag (six histidines) binds to nickel-NTA columns; eluted with imidazole.
    • GST-tag binds to glutathione; eluted with competitive glutathione or other strategies.
    • Tags add a purification handle but can interfere with function if placed at critical regions (e.g., N-terminus).
  • Monitoring purification: presence and activity at each step
    • Two complementary readouts:
    • Presence: typically assessed by SDS-PAGE (denaturing gel) to visualize protein size and purity.
    • Activity: enzymatic assays or binding assays to confirm functionality.
    • Ideally, per-mass activity should increase during purification (specific activity rises) even as total yield declines.
    • Yield: fraction of starting protein recovered at each step; some loss is expected, but the aim is to retain the protein while removing contaminants.
  • Practical notes on purification workflow
    • Start with cell disruption; collect soluble protein; discard membrane and debris unless membrane proteins are the target.
    • Use a combination of techniques to achieve sufficient purity for downstream experiments.
    • Anticipate and manage trade-offs between purity, yield, and activity.
  • Antibodies and immunoprecipitation
    • Antibodies can be raised against the protein (antigen) or a specific peptide epitope; antibodies enable highly specific capture.
    • Immunoprecipitation (IP): use antibody-bound beads to pull down the target protein from a lysate; co-precipitated proteins can represent interactors (the interactome).
    • IP is used for targeted purification and for discovering interacting partners (proteomics).
    • Proteomics (bottom-up): compare proteins co-immunoprecipitated under different conditions to identify changes in interactions and potential functional pathways.
  • Two-dimensional gel electrophoresis (2D-GE)
    • First dimension: isoelectric focusing (IEF) separates proteins by their isoelectric point (pI).
    • Second dimension: SDS-PAGE separates by size.
    • Visualization: dots representing individual proteins; comparisons across conditions reveal differential interactors or changes in expression.
  • Proteomics and systems-level insights
    • Immunoprecipitation followed by proteomic analysis provides a view of the interactome for a given protein under defined conditions.
    • Changes between control and treatment or healthy vs. diseased tissue can highlight candidate proteins linked to specific functions or disease processes.

AI and computational prediction of protein function

  • Context and impact
    • Artificial intelligence and machine learning are transforming protein structure and function prediction.
    • AlphaFold (and other models) predict structure from sequence, enabling rapid hypothesis generation about active sites, ligand-binding residues, and regulatory regions.
    • AlphaFold’s impact: rapid generation of candidate structures, enabling focused experimental validation; cited as revolutionizing the field due to the scale and speed of prediction.
  • How AlphaFold works (high-level)
    • Input: amino acid sequence (can be whole protein, a domain, or a segment).
    • Step 1: sequence similarity search across diverse species; collect related sequences.
    • Step 2: multiple sequence alignment to identify conserved residues and co-evolving contacts.
    • Step 3: identify coevolving residues that likely interact in 3D space; construct a distance map of contacting atoms/residues.
    • Step 4: computationally assemble a 3D structure that satisfies the distance constraints while avoiding steric clashes.
    • Step 5: apply AI refinement to improve the model and estimate a probability score for correspondence to reality.
    • Output: a hypothetical 3D structure with an associated confidence/probability metric.
  • Strengths and limitations
    • Strengths: strong ability to predict well-structured regions and overall folds; provides a valuable candidate structure when experimental structures are unavailable.
    • Limitations: disordered regions are harder to predict accurately; predictions are probabilistic and require experimental validation.
    • The accuracy tends to be higher for proteins with well-defined tertiary structures (e.g., rigid cores) than for highly flexible or intrinsically disordered regions.
  • Practical use of AlphaFold predictions
    • Use predicted structures to identify potential active sites, ligand-binding pockets, and regulatory sites as starting points for experimental work.
    • Generate candidate residues for mutational analysis or targeted screening of ligands.
    • Still requires experimental validation to confirm function and binding.

Quick recap: core concepts and equations

  • Central dogma recap (DNA → RNA → protein) and proteome variability across conditions and cell types.
  • Four main physical/chemical properties used for purification:
    • Size (gel filtration): see size-based separation.
    • Charge (ion exchange; pH-dependent).
    • Solubility (salting in/out).
    • Specific interactions (affinity purification).
  • Key definitions
    • pKa: pH at which a functional group is 50% protonated.
    • pI (isoelectric point): the pH at which the protein has net zero charge.
  • Important quantitative concept
    • Enzyme kinetics can be described by Michaelis–Menten behavior:
      v = \frac{V{max} [S]}{Km + [S]}
    • This underpins the rationale for purifying proteins to measure catalytic parameters (Km, Vmax) without cellular noise.
  • Practical experimental notes
    • Purification is not the goal itself; it is a means to enable measurement of presence, purity, activity, and interactions.
    • Always verify presence and activity at each purification step to avoid wasted time and resources.
    • When purifying with tags, ensure the tag does not disrupt function; choose expression system accordingly.

Connections to broader themes

  • Bridging reductionist and systems perspectives:
    • Reductionist purification yields detailed mechanistic insight into a single protein.
    • Systems proteomics reveals how a protein fits into broader networks and pathways.
  • Real-world relevance:
    • Proteome variability under different physiological conditions underpins development, disease, and response to therapy (e.g., fetal vs. adult hemoglobin, sickle cell disease).
    • AI-driven structure prediction accelerates discovery and hypothesis generation but must be complemented by empirical validation.
  • Ethical and practical implications:
    • Computational predictions should be validated experimentally to avoid over-interpretation.
    • Use of model systems and expression hosts must consider post-translational modifications and physiological relevance.

Summary of key takeaways

  • The proteome is dynamic and context-dependent; understanding protein function requires knowing which proteins are present and active under given conditions.
  • Purification is a critical, multifaceted tool, chosen based on the protein’s properties, required yield, and downstream assays.
  • Purification relies on exploiting size, charge, solubility, and binding affinities, with ongoing monitoring of presence and activity.
  • Antibodies and proteomics extend purification into interactome analysis, enabling discovery of protein networks.
  • AI-powered predictions (e.g., AlphaFold) provide powerful structure-function hypotheses that drive experimental prioritization and discovery.