M2L1 Epigenetics and genomic landscapes

Functional elements in linear sequences

  • Core promoter

    • Transcription start site (TSS) - where transcription is initiated

    • TATA box - AT-rich core promoter element (binding site for TFs)

      • TFIID complex contains TATA-binding protein (TBP) to bind to TATA and recruit general transcription factors and RNA pol II

    • B recognition element (BRE) - flanks the TATA box and is reognised by TFIIB which bridges TBP and RNA pol II

    • Downstream regulatory element (DRE) - recognised by components of TFIID, particularly TBP-associated factors (TAFs) which enhance transcription

  • Upsteam regulation

    • RNA pol binds to the core

    • Transcription factors may bind upstream, very close or also distally to the core (relying on DNA looping to contact the RNA pol)

    • More transcription factors interacting with RNA pol results in greater transcription

    • Insulators - boundary. elements that block interactions between enhancers and promoters when positioned between them

      • Bound by CTCF (CCTC-binding factor), forming chromatin domains and ensuring that an enhancer only activates the correct target gene and not neighbouring ones

  • Distance requlation

    • Super/stretch enhancers - massively upstream but can be looped to make contacts with distant regions

    • Enhancer RNAs (eRNA) - involved in gene regulation

  • Splicing

    • Introns and exons can be spliced to produce mature mRNA (removing introns)

    • Splice sites and acceptor sites are recognised by spliceosomes

    • Splice sites can be mutated in cancer, disrupting splicing patterns leading to inclusion of introns - they may have stop sequences and cause abortive translation 

  • Exon skipping

    • DNA methylation in CpG island might silence some exons, causing them to be skipped when splicing exons together

    • Regulatory elements may cause single exon skipping

  • Alternative promoters

    • CpG islands can sit around core promoters, enhancer elements, or internal in the gene (internal core promoter)

    • RASSF1 bivalent promoter of solid cancers (Hippo tumour suppressor pathway)

      • 1α CpG island is in the promoter region of RASSF1A  and and the 2γ CpG island is in the promoter of RASSF1C

      • When 1α is methylated transcription starts from 2γ, producing the C isoform and when 2γ is methylated transcription starts from the 1α promoter, skipping 2γ to produce the A isoform

      • Two different transcripts can be produced - RASSF1A or RASSF1C

        • Both have identifical C terminal, Ras associated (RA) domain, and SARAH domain (protein-protein interaction domain)

        • A isoform has a unique N terminal

      • A and C can both bind to/activate Src using their SARAH domain

      • A isoform can also bind to C terminal Src kinase (CSK) which inhibits Src, whereas C isoform does not bind to CSK and only activates Src (Hippo signalling)

      • RASSF1 alternative promoters impact prognosis

      • High methylation in RASSF1A / low methylation in RASSF1C = worse prognosis

  • Translational control

    • AUG is the start sequence - presence of stem loop structures within transcripts can influence where the ribosome starts translating if there is AUG in the middle of the transcript

      • Eg. AUG at the beginning may initiate transcription if AUG in the middle is ‘hidden’ in a stem loop strutcure, but if this structure is disrupted translation may start in the middle using this as an internal ribosomal start side

    • Degradation by miRNA or siRNA - target RNA degradation

      • Can regulate transcription depending on what is needed at the time, eg. if a short term rapid response is needed for an acute environmental trigger, a burst of transcription may be needed for a short time, after which it may no longer be needed, alternatively for long-term processes like development transcripts may need to be produced slowly and stably over longer periods

    • Regulation by lncRNA - stop the reading of transcripts

Chromosome organisation

  • Each chromosome occupies (territory) its own space in the nucleus

  • At the periphery of the nucleus there is more repressive machinery which silence genes (B compartment), whereas in the centre there is more open chromatin and active genes (A compartment)

  • Topologically associated domains (TADs) - interacting aspects of different chromosomes in 3D space

  • Chromatin loops - DNA strand loops to bring together distant genomic regions (eg. enhancer and promoter), typically smaller and more specific interactions occuring within (or sometimes between) TADs

  • In between territories there are:

    • Nuclear speckles, PML bodies, cajal bodies, and paraspeckles - different types of nuclear bodies

      • Nuclear speckles - storing and supplying RNA splicing factors

      • Cajal bodies - assembling and modifying snRNPs and snoRNPs for RNA splicing and rRNA modification

      • PML body - contains PML protein and regulatory proteins for DNA repair, apoptosis, antiviral response, senescence and tumour suppression

      • Paraspeckles - retaining RNA and RNA-binding proteins in stress adaptation, differentiation, and  stress regulation

    • Splicing factories

    • Transcription factories

    • Polycomb domain

  • Transcription factories are physically confined by nuclear actin to give stability, allowing them to go through the different stages of transcription 

    • Nuclear actin and myosin are associated with phase separation factors - driven by hydrophobicity/hydrophilicity of proteins

    • Interface of phase separated boundaries are a site of transcription

  • Transcription factories

    • Ratchet model 

      • Circularisation of genes - transcription factors just need to be activated in one place and necessary genes can be ‘dragged in’ via chromatin looping

      • Multiple genes can be activated at once

      • DNA is extruded through cohesin rings, bringing CTCF sites together, oriented either in the same or opposite direction

      • CTCF sites in the convergent orientation stops the DNA from looping out

      • Mediator complex - scaffolding of TF complex to facilitate their stability

        • Mutations do not directly cause cancer, eg. we all have mutations due to environmental factors like UV etc but not all of us have cancer, it is likely due to mutations in mediators and regulators of gene expression that cause the mutations to become visible

        • Concentrates transcription machinery in a region, allowing for transcription re-initiation and rapid cycling of the process 

  • Topological associated domains (TADs)

    • Using chromosome conformation capture methods (CCC) - interacting regions of chromatin can be crosslinked using formaldehyde, DNA is then digested and remaining fragments are ligated and DNA is analysed after removing crosslinks

      • Sequencing of interacting regions reveals the presence of TADs, compartments and chromosome territories

    • TADs are large chromosomal regions in which DNA sequences interact more frequently with each other than sequences outside the domain

    • Act as functional neighbourhoods - enhancers within a TAD tend to regulate promoters within the same TAD

    • Insulates gene regulatory activity, preventing enhancers from activating genes in neighbouring TADs

  • Lamin associated domains (LADs)

    • Lamin is the structural component inside the nuclear envelope, forming a rigid structure which protects the genome

    • Chromatin that is closest to the nuclear envelope are LADs

    • Proximity of histones and chromatin modifiers (histone deacetylases, methylases) are lined up in association/bound with proteins in the inner leaflet of the nuclear envelope

    • Presence of repressive machinery at the inner leaflet causes genes within LADs to be silenced

    • LADs position chromatin at the nuclear periphery, organising the genome into the active central compartment and inactive peripheral zones

    • During development or differentiation LADs can detach or form, allowing changes in which genes are active or silenced

    • Nucleosome remodeling and deacetylase complex (NuRD) and polycomb repressive complex 2 (PRC2) are the epigenetic core of LAD silencing

      • This is implicated in nuclear reprogramming upon fertilisation, genomic imprinting, X chromosome inactiation, development, wound repair, dedifferentiation in cancer

    • 5hmC tends to be at the periphery of CpG islands whereas 5mC was in the middle

      • 5hmC is associated with transcriptionally active or poised promoters, functioning as a transmition mark to either maintain a dynamic, active chromatin state or marking sites being demethylated by TET enzymes

      • 5mC is common in repressed promoters, such as those bound by Polycomb complexes or in LADs, which recruits MBD proteins which further recruit NuRD and HDACs to enforce repression

      • 5hmC tends to decrease in cancer progression causing global hypermethylation and transcriptional silencing

      • TETs are dependent on α-ketoglutarate (a TCA cycle intermediate), hence decreasing 5hmC reflects deregulated cellular metabolism/shift to aerobic glycolysis

  • Nucleolar associated regions (NADs)

    • The nucleolus contains rDNA for ribosome synthesis

    • Chromosomes with rDNA components ‘stick their head’ into the nucleolus

    • There are hundreds of copies of ribosomal genes to make sure we can make enough

    • >95% of cells energy goes into making ribosomes

    • Nucleolus also has lamin A/C - continuous link between the envelope and the nucleolus as this would allow ribosomes to get out into the cytoplasm easier

    • rDNA units are lost over time as they are heavily transcribed

      • Instead of repairing damage in this region it is removed, since there are hundreds of copies of these genes

      • Safer to lose the gene then trying to repair it because they are highly repetitive

      • Associated with biological aging

        • Different from chronological age associated with telomeres as telomeres are non-functional, whereas rDNA is functional and alcohol consumption, smoking, stress etc will require more repair proteins to be produced which uses up more units

Causes of cancer

  • Genetic - eg. mutations in p53 (Li-Fraumeni), BRCA1/2 (breast and ovarian cancer), NF1/2 (neurofibromatosis)

  • Environmental - eg. smoking and lung cancer, asbestos in mesothelioma

  • Sporadic - ‘dark genome’?

  • Mutations in genes - inactivating (nonsense deletion, stop, frame shift) in tumour suppressors, activating (change protein function) in oncogenes

  • Mutations that affect gene regulation - gene activity, translation, genomic topology

  • Chromosome translocation - more likely between chromosomes in neighbouring terrotories or sharing the same transcription factory

  • Form follows function

    • Mutation/translocation, altered methylation, altered CTCF

    • —> Loss of neighbourhoods —> Altered gene expression