Pathway Analysis and Drug Discovery Lecture Notes

Introduction

  • This lecture – traditionally delivered near July 4th in the Biotechnology program – focuses on Pathway Analysis & Drug Discovery.
  • Over-arching aim: show how understanding complex molecular pathways can be leveraged to identify, validate and optimize new therapeutic drugs.

What Is a “Pathway”?

  • No single, universally accepted definition.
  • Wikipedia (biochemistry perspective): a metabolic pathway is “a series of chemical reactions occurring within a cell”, in which a principal molecule is successively modified by enzymes.
  • Practical research usage extends the term to:
    • Protein–protein or protein–ligand interaction networks.
    • Gene-regulatory or signal-transduction networks.
  • Functional core: a pathway describes one specific biological function, linking molecules, reactions and regulations to a phenotypic outcome.
  • Iconic illustration: posters of the cell-cycle/cell-proliferation map showing hundreds of interacting proteins and gene products; serves as a reminder of biological complexity.

Pathways vs. Gene Sets

  • Gene Set (GS)
    • Collection of genes grouped for a reason.
    • E.g., all genes in a GO term, all genes measured in an assay, or all genes co-expressed in a cluster.
  • Pathway (PW)
    • Sub-type of gene set with well-defined mechanistic interactions and order.
  • Relationship: All genes in a pathway ∈ a gene set, but not every gene set constitutes a pathway.
  • In literature & software, terminology sometimes blurred – students must confirm context.

Gene Ontology (GO) – The Standard Vocabulary

  • GO is an international, curated bioinformatics effort that standardises gene/product attributes across species & databases.
  • Three ontologies (hierarchical trees):
    • Biological Process (BP) – e.g., angiogenesis, glycolysis, cell cycle.
    • Molecular Function (MF) – e.g., ATP binding, kinase activity.
    • Cellular Component (CC) – e.g., nucleus, ribosome, plasma membrane.
  • GO’s web interface lets researchers input a gene and explore all GO terms it maps to; links out to KEGG, IPA and other pathway resources.

The Grand Challenge in Drug Discovery

  • “Systemic generation of novel biological & therapeutic insights.”
  • Sequential workflow:
    1. Gene discovery – find genes implicated in a disease.
    2. Pathway/process mapping – place those genes in cellular context via GO, KEGG, IPA, etc.
    3. Mechanistic hypothesis – propose how gene perturbation drives phenotype.
    4. Experimental validation – in vitro & in vivo assays.
    5. Target–lead identification – map druggable nodes and screen for chemical leads.
    6. Pharmacology & toxicology – study xenobiotic interactions, side-effects, pathologies.

Step 1 – Building the Initial Gene List

  • Comparative studies
    • Microarray, RNA-Seq, qRT-PCR, proteomics: compare normal vs. disease tissue.
  • Clustering/Classifications
    • Identify co-expressed gene clusters; changes often occur in concert.
  • Homology analysis
    • Use BLAST to locate orthologs already implicated in other species or disorders.
  • “Any source you can think of” – literature mining, CRISPR screens, GWAS hits, etc.

Step 2 – Enrichment & Commonality Analysis

  • Questions asked of the gene list:
    • Do genes share molecular functions, biological processes, or cellular compartments?
    • Are they annotated to a common pathway?
    • Do they share TF-binding sites, miRNA targets, protein domains?
    • Are they co-mentioned in disease databases?
  • GO/Pathway Enrichment Analysis
    • Quantifies whether overlaps are greater than expected by chance.
    • Classic statistic: Fisher’s Exact Test.
      p=(a+b)!(c+d)!(a+c)!(b+d)!a!b!c!d!n!p = \frac{(a+b)!(c+d)!(a+c)!(b+d)!}{a!\,b!\,c!\,d!\,n!}
      where a,b,c,da,b,c,d fill the 2×2 contingency table and nn is total genes.
    • Modern tools perform thousands of such tests automatically and adjust pp for multiple hypotheses.

Software & Databases for Pathway Analysis

  • Two major business models:
    1. Free / Community-Driven
    • Rely on volunteer curation; broad accessibility.
    • Examples: GO website, Reactome, STRING, Cytoscape plug-ins.
    1. Commercial / Subscription
    • Paid staff perform manual curation, QA & continuous updates; generally higher confidence.
    • Examples: Ingenuity Pathway Analysis (IPA), MetaCore, KEGG (now paid).

Notable Databases Highlighted in Lecture

  • PathwayGuide.org
    • Rapidly expanding: grew from 222 pathways ~6 y ago to 702 (checked this morning).
    • Contents: PPI networks, metabolic/signalling maps, TF interactions, protein–compound links, gene-interaction networks.
  • KEGG (Kyoto Encyclopedia of Genes & Genomes)
    • Now licensed via Pathway Solutions; offers academic & commercial plans.
    • Produces colourful pathway maps. Example given: Tryptophan Metabolism – every arrow a potential drug-target node.

Illustrative Pathway Maps Discussed

  • Tryptophan metabolism – highlights multiple enzymes/cofactors; drug leads could inhibit or enhance any step.
  • Angiogenesis map – critical in tumour vascularisation; anti-angiogenic drugs (e.g., VEGF inhibitors) exploit nodes here.
  • Uterine smooth-muscle contraction – important for labour pharmacology; each signalling molecule is a conceivable obstetric drug target.

Why Organise Data into Pathways?

  • Reveals biological meaning behind joint expression changes.
  • Groups genes/proteins into manageable themes rather than isolated hits.
  • Pinpoints crucial intervention points where a drug can modify outcome.
  • Recognises that biology is redundant & robust – often several paralogous genes perform similar roles → multiple therapeutic “bites at the apple”.

Practical/Philosophical Implications

  • Drug discovery is no longer “one gene → one drug” but “network → modulator”.
  • Requires integration of wet-lab assays, in silico modelling, and statistical genomics.
  • Ethical dimension: better pathway understanding may reduce late-stage drug failures, saving cost, time, and patient risk.

Key Takeaways

  • A pathway captures a functional molecular narrative; a gene set is any purposeful list of genes.
  • Robust drug discovery pipelines begin with accurate gene lists, expand to enriched pathways, and culminate in validated targets/leads.
  • Bioinformatics databases (GO, PathwayGuide, KEGG, IPA) are indispensable – know their strengths, weaknesses & licensing terms.
  • Statistical enrichment (e.g., Fisher’s Exact) underpins confidence that observed overlaps are not random.
  • Every interaction arrow on a pathway map can be envisioned as a potential therapeutic lever – the art is picking which lever to pull.