Pathway Analysis and Drug Discovery Lecture Notes

This lecture – traditionally delivered near July 4th in the Biotechnology program – focuses on Pathway Analysis & Drug Discovery.
Over-arching aim: show how understanding complex molecular pathways can be leveraged to identify, validate and optimize new therapeutic drugs.

No single, universally accepted definition.
Wikipedia (biochemistry perspective): a metabolic pathway is “a series of chemical reactions occurring within a cell”, in which a principal molecule is successively modified by enzymes.
Practical research usage extends the term to:
- Protein–protein or protein–ligand interaction networks.
- Gene-regulatory or signal-transduction networks.
Functional core: a pathway describes one specific biological function, linking molecules, reactions and regulations to a phenotypic outcome.
Iconic illustration: posters of the cell-cycle/cell-proliferation map showing hundreds of interacting proteins and gene products; serves as a reminder of biological complexity.

Gene Set (GS)
- Collection of genes grouped for a reason.
- E.g., all genes in a GO term, all genes measured in an assay, or all genes co-expressed in a cluster.
Pathway (PW)
- Sub-type of gene set with well-defined mechanistic interactions and order.
Relationship: All genes in a pathway ∈ a gene set, but not every gene set constitutes a pathway.
In literature & software, terminology sometimes blurred – students must confirm context.

GO is an international, curated bioinformatics effort that standardises gene/product attributes across species & databases.
Three ontologies (hierarchical trees):
- Biological Process (BP) – e.g., angiogenesis, glycolysis, cell cycle.
- Molecular Function (MF) – e.g., ATP binding, kinase activity.
- Cellular Component (CC) – e.g., nucleus, ribosome, plasma membrane.
GO’s web interface lets researchers input a gene and explore all GO terms it maps to; links out to KEGG, IPA and other pathway resources.

Comparative studies
- Microarray, RNA-Seq, qRT-PCR, proteomics: compare normal vs. disease tissue.
Clustering/Classifications
- Identify co-expressed gene clusters; changes often occur in concert.
Homology analysis
- Use BLAST to locate orthologs already implicated in other species or disorders.
“Any source you can think of” – literature mining, CRISPR screens, GWAS hits, etc.

Questions asked of the gene list:
- Do genes share molecular functions, biological processes, or cellular compartments?
- Are they annotated to a common pathway?
- Do they share TF-binding sites, miRNA targets, protein domains?
- Are they co-mentioned in disease databases?
GO/Pathway Enrichment Analysis
- Quantifies whether overlaps are greater than expected by chance.
- Classic statistic: Fisher’s Exact Test.
  $p = \frac{(a+b)!(c+d)!(a+c)!(b+d)!}{a!\,b!\,c!\,d!\,n!}$
  where $a,b,c,d$ fill the 2×2 contingency table and $n$ is total genes.
- Modern tools perform thousands of such tests automatically and adjust $p$ for multiple hypotheses.

PathwayGuide.org
- Rapidly expanding: grew from 222 pathways ~6 y ago to 702 (checked this morning).
- Contents: PPI networks, metabolic/signalling maps, TF interactions, protein–compound links, gene-interaction networks.
KEGG (Kyoto Encyclopedia of Genes & Genomes)
- Now licensed via Pathway Solutions; offers academic & commercial plans.
- Produces colourful pathway maps. Example given: Tryptophan Metabolism – every arrow a potential drug-target node.

Tryptophan metabolism – highlights multiple enzymes/cofactors; drug leads could inhibit or enhance any step.
Angiogenesis map – critical in tumour vascularisation; anti-angiogenic drugs (e.g., VEGF inhibitors) exploit nodes here.
Uterine smooth-muscle contraction – important for labour pharmacology; each signalling molecule is a conceivable obstetric drug target.

Reveals biological meaning behind joint expression changes.
Groups genes/proteins into manageable themes rather than isolated hits.
Pinpoints crucial intervention points where a drug can modify outcome.
Recognises that biology is redundant & robust – often several paralogous genes perform similar roles → multiple therapeutic “bites at the apple”.

Drug discovery is no longer “one gene → one drug” but “network → modulator”.
Requires integration of wet-lab assays, in silico modelling, and statistical genomics.
Ethical dimension: better pathway understanding may reduce late-stage drug failures, saving cost, time, and patient risk.

A pathway captures a functional molecular narrative; a gene set is any purposeful list of genes.
Robust drug discovery pipelines begin with accurate gene lists, expand to enriched pathways, and culminate in validated targets/leads.
Bioinformatics databases (GO, PathwayGuide, KEGG, IPA) are indispensable – know their strengths, weaknesses & licensing terms.
Statistical enrichment (e.g., Fisher’s Exact) underpins confidence that observed overlaps are not random.
Every interaction arrow on a pathway map can be envisioned as a potential therapeutic lever – the art is picking which lever to pull.