CH8.2 – Transcriptomes and Proteomes
Transcriptome: Definition & Importance
- A transcriptome = complete set of RNA transcripts present in a cell/organism under specified conditions.
- Emphasises that only a subset of genomic DNA is ever expressed.
- Dynamic – varies between cell types, developmental stages, environmental cues.
- Knowing which RNAs are present gives insight into gene-expression patterns, physiological state, and potential responses to stimuli (e.g., cancer vs. healthy tissue).
Methods to Analyse Transcriptomes
Microarrays (previously introduced in Chapter 7)
- Workflow recap:
- Collect total RNA from Sample 1 (labelled green) and Sample 2 (labelled red).
- Use reverse transcriptase to create complementary DNA (cDNA) from each RNA pool while incorporating fluorescent dye.
- Hybridise labelled cDNAs to a solid-surface DNA microarray containing immobilised probes for each gene of interest (thousands of “spots”).
- Interpretation of colour intensities:
- Red spot = higher transcript abundance in Sample 2.
- Green spot = higher transcript abundance in Sample 1.
- Yellow spot (overlap) = comparable expression in both.
- Gradations in brightness ∝ transcript copy number.
- Main advantages: simultaneous, semi-quantitative survey of thousands of loci.
- Caveats: relies on prior knowledge of gene sequences; has limited dynamic range.
RNA-Seq (high-throughput RNA sequencing)
- Made feasible by modern, faster & cheaper next-gen sequencers.
- Detailed workflow:
- Isolate total RNA from sample(s).
- Deplete ribosomal RNA (rRNA) because rRNA is overwhelmingly abundant and rarely of functional interest.
- Reverse transcription → cDNA.
- Fragmentation: shear cDNA into short pieces.
- Adaptor/linker ligation (purple & yellow): provides known sequences for primer binding during sequencing; critical for cluster/amplicon generation.
- Sequencing: produce short reads (typically 50–300 bp, depending on platform).
- Counting/quantification: map reads back to a reference genome or assemble de-novo; read depth per gene ∝ transcript abundance.
- Output: digital counts (# of reads per gene), enabling absolute or relative expression measures, differential-expression analysis, alternative splicing detection, novel transcript discovery.
- Community resources: expression datasets deposited in public repositories (e.g., NCBI GEO, SRA) → enables meta-analysis & clinical decision support (personalised oncology).
Proteome: Definition & Importance
- A proteome = complete set of proteins present in a cell/organism under defined conditions.
- Rationale for studying proteins in addition to RNA:
- mRNA abundance ≠ protein abundance due to translational regulation, differential stability, post-translational modification, localisation.
- Proteins are the direct functional molecules driving phenotype.
Methods to Analyse Proteomes
Two-Dimensional Gel Electrophoresis (2D-GE)
- First dimension (isoelectric focusing):
- Load protein mixture on a strip with immobilised pH gradient (IPG).
- Apply electric field → proteins migrate until net charge =0 = their isoelectric point (pI).
- Results in separation based on charge.
- Second dimension (SDS-PAGE):
- Place IPG strip across the top of an SDS-polyacrylamide slab gel.
- SDS coats proteins with uniform negative charge; separation now based on molecular weight.
- Smaller proteins migrate further (bottom of gel = low MW; top = high MW).
- Output: complex pattern of discrete spots, each ideally representing a single protein species (including isoforms & PTM variants).
- Spots can be excised for identification.
Mass Spectrometry (MS)
- Pipeline after 2D-GE spot excision or direct proteolytic digest of complex mixture:
- Protease digestion (commonly trypsin) → predictable peptide fragments.
- Ionisation (e.g., MALDI, ESI) → peptides carry charge.
- Accelerate peptides through electric field into a mass analyser (often magnetic or time-of-flight).
- Separation by mass-to-charge ratio (m/z): small m/z exit first; large m/z later, producing a spectrum.
- Match experimental spectrum to theoretical spectra in databases → protein identification, PTM mapping.
- Significance: highly sensitive, can quantify relative or absolute protein levels, detect modifications, and identify unexpected or novel proteins.
Inferring Protein Function
Sequence Motifs
- Motif: short, conserved amino-acid stretch associated with specific biochemical function (e.g., ATP-binding, metal coordination, catalytic triad).
- Example from Bailey et al. 2015: alignment of human, mouse, frog proteins possessing an ATP-binding motif; conserved residues visually cluster.
- If unknown protein contains the motif, infer putative ATP-binding role.
- Function sometimes hidden in sequence but revealed by 3-D fold similarity.
- In-silico modelling (e.g., FIREDOCK v2; link provided in lecture) can identify structural homologues with annotated functions.
- Mentioned personal use case: functional prediction for a Helicobacter pylori protein previously listed as “unknown”.
Phylogenetic Profiling
- Concept: genes that participate in the same pathway/complex are often co-present or co-absent across organisms.
- Workflow illustrated with 4 species, 7 proteins:
- Proteins 3 & 6 always present or absent together → hypothesised functional linkage (e.g., both required for flagellar motility).
- Species lacking both likely lack the corresponding phenotype.
- Provides functional clues even without direct biochemical data; scalable to entire genomes using comparative genomics.
Connections, Ethical & Practical Implications
- Builds on Chapter 7 (microarrays) and earlier discussions on reverse transcriptase, SDS-PAGE, comparative genomics.
- Clinical relevance: Transcriptomic signatures guide cancer subtype classification & therapy choice; proteomic biomarkers aid diagnostics.
- Ethical/data-sharing: Public databases accelerate discovery but require careful patient anonymisation and consent.
- Technology-driven progress: Falling sequencing costs democratise RNA-Seq; high-resolution MS advances enable proteome-wide quantification.
Key Terms & Definitions (Quick Reference)
- cDNA: DNA copy synthesised from RNA template.
- Reverse Transcriptase: RNA-dependent DNA polymerase.
- Microarray: slide with immobilised DNA probes for parallel hybridisation assays.
- Adaptor/Linker: short, known oligo sequence ligated to DNA fragments to facilitate amplification/sequencing.
- Isoelectric Point (pI): pH at which a molecule carries no net charge.
- SDS-PAGE: electrophoretic technique resolving proteins by size using sodium dodecyl sulfate.
- m/z: mass-to-charge ratio in mass spectrometry.
- Motif: conserved sequence element linked to specific function.
- Phylogenetic Profiling: comparative method leveraging gene presence/absence patterns across species to infer functional association.
Summary Bullet Highlights
- Transcriptome = RNA complement; Proteome = protein complement.
- RNA levels do not always predict protein levels; hence both layers are essential.
- Main transcriptome tools: Microarray (hybridisation-based) & RNA-Seq (sequencing-based).
- Main proteome tools: 2D-GE (charge + size separation) & MS (peptide mass analysis).
- Bioinformatic approaches (motifs, structural modelling, phylogenetic profiling) extend experimental data to functional predictions.
- Interdisciplinary integration (genomics, transcriptomics, proteomics, bioinformatics) offers a holistic view of cellular biology, disease mechanisms, and therapeutic opportunities.