1/35
Looks like no tags are added yet.
Name | Mastery | Learn | Test | Matching | Spaced | Call with Kai |
|---|
No analytics yet
Send a link to your students to track their progress
RNA in the nucleus is first transcribed as a pre-mRNA
which still contains introns (non-coding sequences)
Before the mRNA can be exported and translated, several processing steps must occur:
5' capping
3' poly-A tail addition
RNA splicing — removal of introns to yield the mature, spliced mRNA
Critically, there is a time lag between transcription and splicing.
The unspliced (pre-mRNA) transcript exists in the cell for a measurable window before becoming the mature spliced form.
Simple model (spliced mRNA only):
dS/dt = α−γS
Extended model (capturing unspliced → spliced dynamics):
dU/dt = a - BU
dS/dt = BU - yS
U = unspliced transcript concentration
SS S = spliced transcript concentration
β\beta β = RNA processing (splicing) rate
γ\gamma γ = degradation rate (gene-specific)
Key simplifying assumptions:
β (processing rate) is constant across all genes
γ (degradation rate) is gene-specific
S>U (more spliced than unspliced)
Gene is being downregulated (ΔS<0)
U>S (more unspliced than spliced)
Gene is being upregulated (ΔS>0)
U ≈ S
Gene is at steady state
These plots of U vs S are called
phase portraits
The line y=x serves as a boundary:
cells above it are upregulating that gene; cells below it are downregulating.
A key practical question: standard scRNA-seq uses poly-dT capture beads
(targeting the poly-A tail of processed mRNA)
Internal poly-A stretches in introns
introns may contain short poly-A sequences that are captured by the beads
Lysis captures nuclear contents
when cells are lysed, nuclear pre-mRNA is released along with cytoplasmic mRNA
Empirically, 17–23% of reads in common platforms (inDrop, 10x Chromium, Smart-seq)
are intronic, making this feasible.
RNA velocity uses the U/S ratio
to
predict the future expression state of each cell.
For multiple genes simultaneously:
Compute the residual (deviation from steady-state ratio) for each gene
From those residuals, compute ΔS\Delta S
ΔS — the expected change in spliced abundance
This gives a predicted future expression profile for the cell
Project the current and predicted states onto a low-dimensional embedding (e.g., PCA, UMAP/t-SNE)
Draw an arrow from the current position to the predicted future position
Do this for every cell
you get a velocity field showing the flow of cell states.
Validation: Circadian Rhythm Data
The approach was validated on bulk RNA-seq data of circadian rhythm genes sampled at 3-hour intervals.
At time t the unspliced abundance of a gene predicts
the spliced abundance at time t+1
This held consistently
across multiple timepoints
Phase portraits of circadian genes showed
the expected cyclical dynamics
Applied to myoblast differentiation (known differentiation trajectory: progenitor → intermediate → myocyte):
Marker genes for progenitors showed negative unspliced residuals in later cell states → expression is declining
Marker genes for mature cells showed positive unspliced residuals in earlier cell states → expression is about to rise
Velocity arrows were computed and overlaid on PCA (PC1 vs PC2) and then on t-SNE embeddings
and generally pointed in the expected direction of differentiation.
Velocities computed in PCA space vs. t-SNE/UMAP space can
give discordant results
Apparent bifurcations in t-SNE may not
be visible in PCA
It can be hard to interpret arrows
in non-linear embeddings confidently
Compute velocity using genes that
contribute most to PC1 and PC2
Project in PCA
space
This allows a cleaner statement:
"these cells are activating the transcriptional program associated with [cell type]" without overinterpreting specific cell-to-cell transitions
RNA velocity applied to hippocampal neuron data
showed sequential bifurcations into many neuron subtypes
Arrows formed coherent
flow fields at scale
Starting from random positions in the graph and following velocity arrows
can infer trajectory roots without prior knowledge
This addresses a key limitation of pseudotime methods (e.g., Monocle)
which required manual selection of root cells
Similarly, endpoints of trajectories
can be identified computationally
The embedding sensitivity caveat is also worth remembering
the lecturer clearly views this as the main practical pitfall. The same velocity data can look like a clean linear trajectory in PCA and a bifurcation in UMAP, and these aren't trivially reconcilable.