CH8.2 – Transcriptomes and Proteomes

Transcriptome: Definition & Importance

  • A transcriptome = complete set of RNA transcripts present in a cell/organism under specified conditions.
    • Emphasises that only a subset of genomic DNA is ever expressed.
    • Dynamic – varies between cell types, developmental stages, environmental cues.
    • Knowing which RNAs are present gives insight into gene-expression patterns, physiological state, and potential responses to stimuli (e.g., cancer vs. healthy tissue).

Methods to Analyse Transcriptomes

Microarrays (previously introduced in Chapter 77)

  • Workflow recap:
    • Collect total RNA from Sample 11 (labelled green) and Sample 22 (labelled red).
    • Use reverse transcriptase to create complementary DNA (cDNA) from each RNA pool while incorporating fluorescent dye.
    • Hybridise labelled cDNAs to a solid-surface DNA microarray containing immobilised probes for each gene of interest (thousands of “spots”).
  • Interpretation of colour intensities:
    • Red spot = higher transcript abundance in Sample 22.
    • Green spot = higher transcript abundance in Sample 11.
    • Yellow spot (overlap) = comparable expression in both.
    • Gradations in brightness ∝ transcript copy number.
  • Main advantages: simultaneous, semi-quantitative survey of thousands of loci.
  • Caveats: relies on prior knowledge of gene sequences; has limited dynamic range.

RNA-Seq (high-throughput RNA sequencing)

  • Made feasible by modern, faster & cheaper next-gen sequencers.
  • Detailed workflow:
    1. Isolate total RNA from sample(s).
    2. Deplete ribosomal RNA (rRNA) because rRNA is overwhelmingly abundant and rarely of functional interest.
    3. Reverse transcription → cDNA.
    4. Fragmentation: shear cDNA into short pieces.
    5. Adaptor/linker ligation (purple & yellow): provides known sequences for primer binding during sequencing; critical for cluster/amplicon generation.
    6. Sequencing: produce short reads (typically 5050300300 bp, depending on platform).
    7. Counting/quantification: map reads back to a reference genome or assemble de-novo; read depth per gene ∝ transcript abundance.
  • Output: digital counts (# of reads per gene), enabling absolute or relative expression measures, differential-expression analysis, alternative splicing detection, novel transcript discovery.
  • Community resources: expression datasets deposited in public repositories (e.g., NCBI GEO, SRA) → enables meta-analysis & clinical decision support (personalised oncology).

Proteome: Definition & Importance

  • A proteome = complete set of proteins present in a cell/organism under defined conditions.
  • Rationale for studying proteins in addition to RNA:
    • mRNA abundance ≠ protein abundance due to translational regulation, differential stability, post-translational modification, localisation.
    • Proteins are the direct functional molecules driving phenotype.

Methods to Analyse Proteomes

Two-Dimensional Gel Electrophoresis (2D-GE)

  • First dimension (isoelectric focusing):
    • Load protein mixture on a strip with immobilised pH gradient (IPG).
    • Apply electric field → proteins migrate until net charge =0=0 = their isoelectric point (pI).
    • Results in separation based on charge.
  • Second dimension (SDS-PAGE):
    • Place IPG strip across the top of an SDS-polyacrylamide slab gel.
    • SDS coats proteins with uniform negative charge; separation now based on molecular weight.
    • Smaller proteins migrate further (bottom of gel = low MW; top = high MW).
  • Output: complex pattern of discrete spots, each ideally representing a single protein species (including isoforms & PTM variants).
  • Spots can be excised for identification.

Mass Spectrometry (MS)

  • Pipeline after 2D-GE spot excision or direct proteolytic digest of complex mixture:
    1. Protease digestion (commonly trypsin) → predictable peptide fragments.
    2. Ionisation (e.g., MALDI, ESI) → peptides carry charge.
    3. Accelerate peptides through electric field into a mass analyser (often magnetic or time-of-flight).
    4. Separation by mass-to-charge ratio (m/z): small m/z exit first; large m/z later, producing a spectrum.
    5. Match experimental spectrum to theoretical spectra in databases → protein identification, PTM mapping.
  • Significance: highly sensitive, can quantify relative or absolute protein levels, detect modifications, and identify unexpected or novel proteins.

Inferring Protein Function

Sequence Motifs

  • Motif: short, conserved amino-acid stretch associated with specific biochemical function (e.g., ATP-binding, metal coordination, catalytic triad).
  • Example from Bailey et al. 20152015: alignment of human, mouse, frog proteins possessing an ATP-binding motif; conserved residues visually cluster.
  • If unknown protein contains the motif, infer putative ATP-binding role.

Structural Prediction Tools

  • Function sometimes hidden in sequence but revealed by 3-D fold similarity.
  • In-silico modelling (e.g., FIREDOCK v22; link provided in lecture) can identify structural homologues with annotated functions.
  • Mentioned personal use case: functional prediction for a Helicobacter pylori protein previously listed as “unknown”.

Phylogenetic Profiling

  • Concept: genes that participate in the same pathway/complex are often co-present or co-absent across organisms.
  • Workflow illustrated with 44 species, 77 proteins:
    • Proteins 33 & 66 always present or absent together → hypothesised functional linkage (e.g., both required for flagellar motility).
    • Species lacking both likely lack the corresponding phenotype.
  • Provides functional clues even without direct biochemical data; scalable to entire genomes using comparative genomics.

Connections, Ethical & Practical Implications

  • Builds on Chapter 77 (microarrays) and earlier discussions on reverse transcriptase, SDS-PAGE, comparative genomics.
  • Clinical relevance: Transcriptomic signatures guide cancer subtype classification & therapy choice; proteomic biomarkers aid diagnostics.
  • Ethical/data-sharing: Public databases accelerate discovery but require careful patient anonymisation and consent.
  • Technology-driven progress: Falling sequencing costs democratise RNA-Seq; high-resolution MS advances enable proteome-wide quantification.

Key Terms & Definitions (Quick Reference)

  • cDNA: DNA copy synthesised from RNA template.
  • Reverse Transcriptase: RNA-dependent DNA polymerase.
  • Microarray: slide with immobilised DNA probes for parallel hybridisation assays.
  • Adaptor/Linker: short, known oligo sequence ligated to DNA fragments to facilitate amplification/sequencing.
  • Isoelectric Point (pI): pH at which a molecule carries no net charge.
  • SDS-PAGE: electrophoretic technique resolving proteins by size using sodium dodecyl sulfate.
  • m/z: mass-to-charge ratio in mass spectrometry.
  • Motif: conserved sequence element linked to specific function.
  • Phylogenetic Profiling: comparative method leveraging gene presence/absence patterns across species to infer functional association.

Summary Bullet Highlights

  • Transcriptome = RNA complement; Proteome = protein complement.
  • RNA levels do not always predict protein levels; hence both layers are essential.
  • Main transcriptome tools: Microarray (hybridisation-based) & RNA-Seq (sequencing-based).
  • Main proteome tools: 2D-GE (charge + size separation) & MS (peptide mass analysis).
  • Bioinformatic approaches (motifs, structural modelling, phylogenetic profiling) extend experimental data to functional predictions.
  • Interdisciplinary integration (genomics, transcriptomics, proteomics, bioinformatics) offers a holistic view of cellular biology, disease mechanisms, and therapeutic opportunities.