Long non-coding RNAs (lncRNAs) = RNA transcripts conventionally defined by length > 200\;\text{nt}.
Threshold is arbitrary; some functional lncRNAs (e.g., BC1, snaR) fall at or slightly below 200\;\text{nt}.
Alternative biological definition (Amaral et al.): RNAs functioning as primary or spliced transcripts that are independent of any known small-ncRNA class.
Genome transcription landscape:
Protein-coding portion: small; non-coding (“dark matter”) transcription is vast.
NONCODE v3.0 (2012): 73{,}370 lncRNA entries from 1{,}239 species.
< 200 lncRNAs functionally annotated in lncRNAdb (2011).
Presence across taxa: animals, plants, yeast, prokaryotes, viruses.
General properties
Often low expression, tissue-specific, nuclear/chromatin localization.
Frequently 5'-capped, 3'-polyadenylated, multi-exonic ⇒ mRNA-like.
Sequence conservation: generally poor, yet selected subclasses (e.g., lincRNAs) show domain-level conservation.
Documented roles in
Transcription regulation, splicing, translation, protein localization.
Chromatin architecture & epigenetic regulation.
Cellular processes: imprinting, cell-cycle, apoptosis, stem-cell pluripotency & reprogramming, heat-shock response.
Disease links: cancer progression, neurodegeneration, metabolic disorders, etc.
lncRNAdb survey: \sim42\% (of 182 curated entries) participate in transcriptional regulation.
The review proposes four orthogonal features:
Genomic location & context.
Effect exerted on DNA (cis vs trans).
Mechanism of functioning.
Targeting mechanism/archetype.
Transcribed from regions between annotated protein-coding genes (Fig 1A).
Biological features
Transcriptionally activated like mRNAs; possess “K4–K36” chromatin signature (H3K4me3 at 5', H3K36me3 in body).
\approx70\% of K4–K36-positive lincRNAs show RNA evidence; close to \approx72\% for protein-coding genes.
\approx70\% of human K4–K36 lincRNA domains conserved in mouse (protein-coding ≈80\%).
More conserved than introns & antisense RNAs; more tissue-specific than mRNAs; more stable than intronic lncRNAs.
Functional spectrum: embryonic stem-cell pluripotency, cell proliferation, cancer progression, etc.
Entirely originate from introns of protein-coding genes (Fig 1B).
Poorly characterised; only a minor subset functionally explored.
Transcribed from sense strand of protein-coding loci; may overlap exons or span entire gene (Fig 1C).
Often mRNA-like (polyA, 5' cap, multi-exonic).
Unusual cases encode peptides and act as RNAs:
SRA (Steroid Receptor RNA Activator) – encodes protein + RNA scaffold.
ENOD40 – encodes small peptides + guides RNP localisation in legumes.
Transcribed from antisense strand of protein-coding loci; three sub-situations (Fig 1D):
Exon–exon overlap with sense gene.
Intronic transcript (no exon overlap).
Cover entire sense gene via intronic overlap.
Validation: strand-specific assays, qRT-PCR, full-length cDNA sequencing.
Prevalence
\sim32\% of human lncRNAs antisense to coding genes.
\sim87\% of coding transcripts possess antisense partners in mouse.
lncRNAs localised to nucleus/chromatin frequently modulate DNA targets.
Influence genes near their own locus.
Mechanisms
Transcriptional interference: RNA–DNA triplex or promoter occlusion blocks Pre-Initiation Complex (PIC).
DHFR upstream transcripts (0.8–7.3 kb): bind DHFR promoter, dissociate TFIIB.
SRG1 RNA (0.4–1.9 kb, yeast): covers SER3 promoter, prevents TF binding.
Chromatin modification recruitment
Xist (19 kb): recruits PRC2 ⇒ H3K27me3 ⇒ X-chromosome silencing.
MEG3 (~1.6 kb), COLDAIR (~1.1 kb, plants): recruit PRC2.
GAL10-ncRNA (~4 kb): recruits Rpd3S HDAC ⇒ histone de-acetylation.
HOTTIP (~3.8 kb): recruits MLL complex ⇒ active chromatin over HOXA.
Act at distant loci; may not require sequence complementarity.
Examples
HOTAIR (~2.2 kb): transcribed from HOXC (chr12), targets HOXD (chr2) & other loci via Suz12; recruits PRC2/LSD1.
7SK snRNA (~330 nt): scaffold for P-TEFb within 7SK snRNP; represses transcription elongation.
B2 SINE RNA: binds RNA Pol II, blocks elongation during heat-shock.
Sub-mechanisms
Transcriptional interference (e.g., DHFR, SRG1, 7SK, B2).
Chromatin remodelling (Xist, MEG3, HOTAIR, HOTTIP, COLDAIR).
Enhancer-associated RNAs (eRNAs) that activate nearby genes:
ncRNA-a1, Evf-2, Alpha-250/Alpha-280.
Binding/modulating splice factors or masking splice sites.
MIAT (9–10 kb): UACUAAC repeats bind SF1 ⇒ inhibit spliceosome assembly.
Malat1 (~7 kb): modulates phosphorylation pool & nuclear speckle distribution of SR proteins ⇒ alt-splicing.
LUST (1.4–2.4 kb): antisense to RBM5, proposed splice-masking.
Association with translation factors/ribosome.
BC1 & BC200: bind eIF4A, PABP ⇒ block 48S complex assembly.
snaR: ribosome-associated; function inferred.
Gadd7: associates with actively translating polysomes.
Splicing–translation coupling: Zeb2NAT retains intron to allow Zeb2 translation.
Natural antisense siRNA-like decay
21A (~300 nt) & 1/2-sbsRNA1 (~0.7 kb) promote target mRNA degradation.
Competing endogenous RNAs (ceRNAs)
linc-MD1: sponges miR-133 & miR-135 ⇒ allows MAML1 & MEF2C expression during myogenesis.
IPS1 (plant), HULC (liver cancer) act as target mimics.
Pseudogene ceRNAs: KRASP1, PTENP1.
BACE1AS (~2 kb): antisense duplex masks miR-485-5p site ⇒ stabilises BACE1 mRNA.
Protein localisation: meiRNA positions Mei2; ENOD40 guides RNP granules.
Telomere maintenance: TERC is RNA template within telomerase.
RNA interference modulation: rncs-1 reduces Dicer activity.
Cellular architecture: MENε/β (NEAT1) scaffolds paraspeckles; Xlsirts & VegT RNAs in Xenopus oocytes.
Wang & Chang (2011) categories:
Signal – expression marks cellular events (e.g., Xist, COLDAIR).
Decoy – sequesters proteins or RNAs (e.g., DHFR upstream RNA, PANDA, ceRNAs).
Guide – directs effector complexes to targets (e.g., HOTAIR, Xist).
Scaffold – structural platform for multi-protein assembly (e.g., HOTAIR, 7SL).
Notes
One lncRNA can combine archetypes (HOTAIR = signal + guide + scaffold).
Interaction modalities: RNA–RNA, RNA–DNA hybrids, RNA secondary/tertiary structure, protein linkers.
High PRC2 affinity subset; potential species-specific chromatin programmes.
Rich dataset for evolutionary comparisons owing to K4–K36 domains.
Antisense: dominant among PRC2-bound RNAs; pervasive regulatory layer.
Sense: occasional coding potential challenges classic gene annotation dichotomy.
Under-studied; may possess unique poly(A) status & subcellular localisation.
NONCODE v3.0: 73{,}370 lncRNAs / 1{,}239 species.
GENCODE v7: detailed manual curation; antisense ≈32\% of human lncRNAs.
Experimental validations: CAGE, strand-specific RNA-seq, qRT-PCR, 5'/3' RACE.
Analysis of NONCODE v3.0 suggests trimodal distribution ⇒ proposal:
Small-lncRNA: 200\text{–}950\;\text{nt} (human ≈ 58\%).
Medium-lncRNA: 950\text{–}4{,}800\;\text{nt} (mouse ≈ 78\%).
Large-lncRNA: >4{,}800\;\text{nt} (human enriched vs mouse).
Requires validation with higher-confidence annotations (GENCODE smaller set).
Many lncRNAs harbor miRNA or snoRNA genes:
H19 exon contains miR-675; LOC554202 hosts miR-31.
MEG3/8/9, Rian, antiPeg11 encode clusters of miRNAs & snoRNAs; potential coordinated targeting.
Post-processing: some lncRNAs preferentially processed into snoRNAs.
Regulatory networks become multilayered: miRNA ↔ lncRNA ↔ mRNA.
Continuous refinement of classification as knowledge grows.
Focused study on chromatin-modifying lncRNA groups beyond PRC2 (e.g., SETD, HDAC variants).
Expand functional annotation of intronic & sense lncRNAs.
Evolutionary linkage between lncRNAs and small ncRNAs (shared loci, precursor relationships).
Integrative analyses (e.g., combined miRNA/mRNA/lncRNA in diseases such as NSCLC) to decode complex regulatory networks.
Transcriptional interference (cis): DHFR-up, SRG1, 7SK, B2.
Chromatin modification (cis/trans): Xist, COLDAIR, MEG3, HOTTIP, HOTAIR, GAL10-ncRNA.
Enhancer RNAs: ncRNA-a1, Evf-2, Alpha-250.
Splicing regulators: MIAT, Malat1, LUST, Zeb2NAT.
Translation regulators: BC1, BC200, snaR, Gadd7.
ceRNAs / miRNA sponges: linc-MD1, IPS1, HULC, PTENP1, KRASP1.
Protein localization: meiRNA, ENOD40.
Telomere template: TERC.
Dicer modulation: rncs-1.
PRC2, PRC1; Rpd3S HDAC; MLL; LSD1; P-TEFb; RNA Pol II; TFIIB; SF1; SR proteins; eIF4A; PABP; Dicer.
lncRNA research reshapes gene definition (e.g., coding vs non-coding boundaries).
Potential diagnostic/prognostic biomarkers (PCAT-1 in prostate cancer, HULC in liver cancer).
Therapeutic avenues: antisense targeting, modulation of ceRNA networks, chromatin-binding interference.