CH8.2 – Transcriptomes and Proteomes

Transcriptome: Definition & Importance

A transcriptome = complete set of RNA transcripts present in a cell/organism under specified conditions.
- Emphasises that only a subset of genomic DNA is ever expressed.
- Dynamic – varies between cell types, developmental stages, environmental cues.
- Knowing which RNAs are present gives insight into gene-expression patterns, physiological state, and potential responses to stimuli (e.g., cancer vs. healthy tissue).

Methods to Analyse Transcriptomes

Microarrays (previously introduced in Chapter $7$ )

Workflow recap:
- Collect total RNA from Sample $1$ (labelled green) and Sample $2$ (labelled red).
- Use reverse transcriptase to create complementary DNA (cDNA) from each RNA pool while incorporating fluorescent dye.
- Hybridise labelled cDNAs to a solid-surface DNA microarray containing immobilised probes for each gene of interest (thousands of “spots”).
Interpretation of colour intensities:
- Red spot = higher transcript abundance in Sample $2$ .
- Green spot = higher transcript abundance in Sample $1$ .
- Yellow spot (overlap) = comparable expression in both.
- Gradations in brightness ∝ transcript copy number.
Main advantages: simultaneous, semi-quantitative survey of thousands of loci.
Caveats: relies on prior knowledge of gene sequences; has limited dynamic range.

RNA-Seq (high-throughput RNA sequencing)

Made feasible by modern, faster & cheaper next-gen sequencers.
Detailed workflow:
1. Isolate total RNA from sample(s).
2. Deplete ribosomal RNA (rRNA) because rRNA is overwhelmingly abundant and rarely of functional interest.
3. Reverse transcription → cDNA.
4. Fragmentation: shear cDNA into short pieces.
5. Adaptor/linker ligation (purple & yellow): provides known sequences for primer binding during sequencing; critical for cluster/amplicon generation.
6. Sequencing: produce short reads (typically $50$ – $300$ bp, depending on platform).
7. Counting/quantification: map reads back to a reference genome or assemble de-novo; read depth per gene ∝ transcript abundance.
Output: digital counts (# of reads per gene), enabling absolute or relative expression measures, differential-expression analysis, alternative splicing detection, novel transcript discovery.
Community resources: expression datasets deposited in public repositories (e.g., NCBI GEO, SRA) → enables meta-analysis & clinical decision support (personalised oncology).

Proteome: Definition & Importance

A proteome = complete set of proteins present in a cell/organism under defined conditions.
Rationale for studying proteins in addition to RNA:
- mRNA abundance ≠ protein abundance due to translational regulation, differential stability, post-translational modification, localisation.
- Proteins are the direct functional molecules driving phenotype.

Methods to Analyse Proteomes

Two-Dimensional Gel Electrophoresis (2D-GE)

First dimension (isoelectric focusing):
- Load protein mixture on a strip with immobilised pH gradient (IPG).
- Apply electric field → proteins migrate until net charge $=0$ = their isoelectric point (pI).
- Results in separation based on charge.
Second dimension (SDS-PAGE):
- Place IPG strip across the top of an SDS-polyacrylamide slab gel.
- SDS coats proteins with uniform negative charge; separation now based on molecular weight.
- Smaller proteins migrate further (bottom of gel = low MW; top = high MW).
Output: complex pattern of discrete spots, each ideally representing a single protein species (including isoforms & PTM variants).
Spots can be excised for identification.

Mass Spectrometry (MS)

Pipeline after 2D-GE spot excision or direct proteolytic digest of complex mixture:
1. Protease digestion (commonly trypsin) → predictable peptide fragments.
2. Ionisation (e.g., MALDI, ESI) → peptides carry charge.
3. Accelerate peptides through electric field into a mass analyser (often magnetic or time-of-flight).
4. Separation by mass-to-charge ratio (m/z): small m/z exit first; large m/z later, producing a spectrum.
5. Match experimental spectrum to theoretical spectra in databases → protein identification, PTM mapping.
Significance: highly sensitive, can quantify relative or absolute protein levels, detect modifications, and identify unexpected or novel proteins.

Inferring Protein Function

Sequence Motifs

Motif: short, conserved amino-acid stretch associated with specific biochemical function (e.g., ATP-binding, metal coordination, catalytic triad).
Example from Bailey et al. $2015$ : alignment of human, mouse, frog proteins possessing an ATP-binding motif; conserved residues visually cluster.
If unknown protein contains the motif, infer putative ATP-binding role.

Structural Prediction Tools

Function sometimes hidden in sequence but revealed by 3-D fold similarity.
In-silico modelling (e.g., FIREDOCK v $2$ ; link provided in lecture) can identify structural homologues with annotated functions.
Mentioned personal use case: functional prediction for a Helicobacter pylori protein previously listed as “unknown”.

Phylogenetic Profiling

Concept: genes that participate in the same pathway/complex are often co-present or co-absent across organisms.
Workflow illustrated with $4$ species, $7$ proteins:
- Proteins $3$ & $6$ always present or absent together → hypothesised functional linkage (e.g., both required for flagellar motility).
- Species lacking both likely lack the corresponding phenotype.
Provides functional clues even without direct biochemical data; scalable to entire genomes using comparative genomics.

Connections, Ethical & Practical Implications

Builds on Chapter $7$ (microarrays) and earlier discussions on reverse transcriptase, SDS-PAGE, comparative genomics.
Clinical relevance: Transcriptomic signatures guide cancer subtype classification & therapy choice; proteomic biomarkers aid diagnostics.
Ethical/data-sharing: Public databases accelerate discovery but require careful patient anonymisation and consent.
Technology-driven progress: Falling sequencing costs democratise RNA-Seq; high-resolution MS advances enable proteome-wide quantification.

Key Terms & Definitions (Quick Reference)

cDNA: DNA copy synthesised from RNA template.
Reverse Transcriptase: RNA-dependent DNA polymerase.
Microarray: slide with immobilised DNA probes for parallel hybridisation assays.
Adaptor/Linker: short, known oligo sequence ligated to DNA fragments to facilitate amplification/sequencing.
Isoelectric Point (pI): pH at which a molecule carries no net charge.
SDS-PAGE: electrophoretic technique resolving proteins by size using sodium dodecyl sulfate.
m/z: mass-to-charge ratio in mass spectrometry.
Motif: conserved sequence element linked to specific function.
Phylogenetic Profiling: comparative method leveraging gene presence/absence patterns across species to infer functional association.

Summary Bullet Highlights

Transcriptome = RNA complement; Proteome = protein complement.
RNA levels do not always predict protein levels; hence both layers are essential.
Main transcriptome tools: Microarray (hybridisation-based) & RNA-Seq (sequencing-based).
Main proteome tools: 2D-GE (charge + size separation) & MS (peptide mass analysis).
Bioinformatic approaches (motifs, structural modelling, phylogenetic profiling) extend experimental data to functional predictions.
Interdisciplinary integration (genomics, transcriptomics, proteomics, bioinformatics) offers a holistic view of cellular biology, disease mechanisms, and therapeutic opportunities.

CH8.2 – Transcriptomes and Proteomes

Transcriptome: Definition & Importance

Methods to Analyse Transcriptomes

Microarrays (previously introduced in Chapter 777)

RNA-Seq (high-throughput RNA sequencing)

Proteome: Definition & Importance

Methods to Analyse Proteomes

Two-Dimensional Gel Electrophoresis (2D-GE)

Mass Spectrometry (MS)

Inferring Protein Function

Sequence Motifs

Structural Prediction Tools

Phylogenetic Profiling

Connections, Ethical & Practical Implications

Key Terms & Definitions (Quick Reference)

Summary Bullet Highlights

Microarrays (previously introduced in Chapter $7$ )