M1L7 R loops in health and disease

R-loops - nascent RNA formed during transcription can hybridise with DNA, leaving a ssDNA strand

  • Formed in all living organisms

  • Preferentially formed on sequences enriched for GCs (as they form 3 hydrogen bonds, thus a thermodynamically more stable structure)

  • Occupy ~10% of the human genome

  • Important regulatory functions

    • Transcription - R-loops slow down RNA pol progression by reducing the processivity of the enzyme

    • DNA replication - Okazaki fragments form short, mini R loops

    • Epigenetics - regulate the formation of histone modifications and DNA methylation

    • Genome stability - ssDNA is unstable and can be subjected to SSB which can lead to DSBs

    • DNA damage response

    • Immune response - R loops can be processed, generating products that are exported to the cytoplasm where they bind with PRRs which can stimulate an immune response

    • CRISPR targeting - gRNA hybridising to target sequence in DNA, forming a trans R-loop which must compete with cis R-loops

      • Cis R loops: RNA transcribed from the same locus where it hybridises

      • Trans: RNA transcribed elsewhere and invades a different DNA locus

      • gRNA (trans) must compete with the nascent transcript (cis) for access to the DNA, if the cis R loop is stable it can block/interfere with Cas9 activity

    • Human disease - cancer and neurodegeneration

  • Molecular mechanisms to detect R-loops

    • S9.6 Ab - recognises RNA/DNA hybrids in non sequence-specific manner with high affinity

      • May have some background due to some binding to dsRNA, RNase H thus needed as a control (this should deplete the signal if it is specific)

    • DNA/RNA immunoprecipitation sequencing (DRIP-seq)/qPCR **

      • Lyse non-crosslinked cells, extract and lyse nuclei, sonicate to break RNA/DNA hybrids

        • Why non-crosslinked - RNA/DNA hybrids are thermodynamically more stable than DNA/DNA structures as it is a B fold whereas RNA/DNA is A fold (duplex is wider and bases are more stuck to each other, thus the hybrids can persist without cross linking), also to understand endogenous interactions rather than cross linking all RNA to DNA

      • IP with S9.6, wash and purify RNA/DNA hybrids

        • Can make a library or amplify using primers, if there is a lot of amplification that means there was a lot of binding/lot of hybrids in that region

      • Sequence/qPCR

        • For all transcribed genes: major R-loop peak at TSS and TTS (latter signal is slightly smaller), and low (but not absent) signal throughout the gene

        • Peak at the promoter shifted slightly into the body of the gene - may be because you need to synthesise a bit of RNA first to start hybridising

        • Promoter R-loops important as polymerase checkpoint

        • R-loops at TTS push RNA pol backwards which slows down its progression and aids in termination, and then the hybrid needs to be opened to free the RNA

          • SETX helicase helps to separate the strand and allow exonuclease XRN2 to degrade a remaining fragment of RNA that was synthesised and is attached to RNA pol II downstream of the poly A site (torpedo model)

          • SETX is mutated in ALS/AOA2 (motor neuron disease)

    • Mass spectrometry to characterise R-loop proteome

      • Chromatin proteins, mRNA processing proteins, rRNA processing proteins,

      • Novel R loop binding factors - RNA binding proteins (R-loop turnover), nucleases (R-loop cleavage), helicases (R-loop resolution), DNA binding proteins (R-loop associated instability)

R-loop proteome

  • Topoisomerases eg. Top I - affects DNA compaction and R-loop expression

  • Nucleases eg. XPG/XPF endonucleases - involved in transcription-coupled NER and R-loop regulation if there is an extreme accumulation (eg. due to mutations in R loop regulators like APOBECs)

    • This generates DSBs which must be repaired, however this is preferable/a more controllable process to R-loop accumulation which can cause more damage

    • DNA fragments and DNA/RNA hybrids can be exported to the cytoplasm where they may be recognised by PRRs (eg. cGAS which acts as a DNA sensor) to generate inflammation

    • RNase H2 - degrades DNA/RNA hybrid, mutations in this cause neurodegenerative and inflammatory disease as they function to resolve R-loops to avoid DNA damage and inflammation

  • Helicases (eg. SETX, DHX9, AQR) - unwinding RNA/DNA hybrids (preferred pathway to degrading the hybrids), unclear whether each of the classes of helicases have specificity for certain genes, cell cycle stages, cell types, pathological conditions… or something else

  • m6A RNA modification machinery (eg. METTL3, YTDF1/2, hnRNPA2B1)

    • METTL3 travels with RNA pol II and acts co-transcriptionally to modify RNA (adds M6A modification which will be part of the hybrid)

    • M6A may increase the stability of the hybrid - some types of modifications in R loops could be implicated in disease pathology by increasing the stability and generating too many R loops

    • hnRNPA2B1 and YTHDF2 are readers of this modification

    • hnRNPA2B1 - recognises the R-loop in G0/G1/S phase, function unclear

    • YTHDF2 - R loop removal in G2/M phase

  • Deaminase (eg. APOBEC3B, AID)

    • APOBEC regulates R loops and promotes cancer mutagenesis

      • APOBEC3B is a part of the APOBEC family of cytidine deaminases which converts C to U in ssDNA 

      • This acts as an antiviral defense factor by editing viral retroelement DNA and introducing inactivating mutations 

      • ssDNA generated in R loop formation is vulnerable to off-target APOBEC3B activity which can deaminate C into U, creating U:G mismatches

      • DNA breaks can also be caused

      • APOBEC3B is often overexpressed in tumours and causes clustered mutation (kataegis)

    • AID (homologue of APOBEC) has positive effect by doing class switching and creating antibody diversity in B cells

R-loops in disease

  • When pathological R-loops > physiological R-loops

    • Repeat expansion disease - R loops accumulate particularly in repetitive sequences, causing coding expansion (eg Huntington’s disease) or non-coding expansion (eg Friedreich’s ataxia)

    • Neurological disorders (ALS4/AOA2) - STX mutations

    • Alcardi-Goutieres Syndrome (AGS) - RNase H2 mutations

    • Cancer - dysregulation of transcription and promoting cancer mutagenesis

  • R loops in cancer

    • Oncogene activation (eg RAS or EGFR) increases global transcription and R loop accumulation which causes replication stress and genomic instability

    • EGFR and HRASV12 get overactivated in cancer which upregulates components of transcription machinery like TATA binding protein and general TFs which increase transcription initiation rate genome-wide

    • This causes RNA pol II to be overactive and more genes are transcribed simultaneously

    • Excessive transcription generates more opportunities for R loop formation which can stall replication forks during transcription-replication collisions and cause  replication stress

    • This causes fork collapse and DNA DSBs which may cause chromosomal rearrangements, DNA copy number changes and mutation accumulation

    • Cells under extreme stress may undergo apoptosis or senescence