M1L8 Epigenetic regulation and DNA integrity
Epigenetics - the structural adaptation of chromosomal regions so as to register, signal or perpetuate altered activity states
Transcription factors are key for gene expression and can be used to reprogram cells (eg. using Yamanaka factors OCT4, DOX2, c-MYC, KLF4 to reprogram fibroblasts into stem cells)
This process is highly inefficient due to organisation of DNA into chromatin (heterochromatin and euchromatin)
Chromatin is organised into nucleosomes (octamer consisting of two each of H2A, H2B, H3 and H4 with DNA wrapped around)
Nucleosomes (beads on a string) are further folded many times into solenoid structure, fibres, chromonema, chromatids and finally mitotic chromosomes
However new evidence shows there is no discrete orders of chromatin structures and chromatin is folded into a variety of compacted states in no particular order

Cytosine tends to get modified, forming 5mC (4-2% of genome) which can be further modified to 5hmC (0.8-0.1%)
Minor modifications - 5fC (0.002%), 5caC (0.0003%)

N terminal tails of nucleosomes can be chemically modified - methylation, acetylation, phosphorylation, ubiquitylation…
Writers (eg. methyltransferase), readers (eg. methyl-binding domain proteins), erasers (eg. demethylases)

Histone variants may also be produced and substitute canonical forms of histones in the nucleosome
C terminus tail of H2AX is longer than canonical H2A which can receive modifications
N terminus is highly conserved
γ-H2AX (phosphorylation of p-Ser139 by ATM) signals DSBs and recruits other proteins for repair
Transcription

Pioneering TFs can invade inaccessible sites (repressed state is modulated by repressors, though there are few relative to activators/co-activators and it is the default): promoters and enhancers
Activators and co-activators open up chromatin structure
Methylation and acetylation marks modulate chromatin remodelling to regulate the permissibility of the gene promoter for transcription
Enhancers and promoters interact via mediator complex
Enhancers can be brought into the vicinity of the promoter to form the transcription pre-initiation complex with RNA pol II and TFs
RNA pol II can be paused and released resulting in a burst of transcription
RNA pol II can start transcribing and then it pauses
Pause mechanism is associated with NELF and DSIF (protein complexes that work together to cause promoter-proximal pausing)
Additional signals needed for RNA pol II to get released and replace NELF with P-TEFb for productive elongation
Increased concentration of activators increases frequency of bursts of transcription to increase transcription rate
DNA modifying enzymes
Cytosine can by methylated by DNMT1, DNMT3A, DNMT3B, which converts SAM to SAH to produce 5mC
5mC is converted to 5hmC using TET1, TET2, TET3 (while 2-OG, O2 ascorbate —> succinate, CO2 ascorbate), which can further convert it to 5fC, then 5caC

Patterns of 5mC
C in CpG is methylated and it is found in a symmetrical in both strands of DNA
Bimodal distribution of methyl CpG in the genome

One peak at heavily methylated CpGs (found in intergenic DNA and repetitive elements which are transcriptionally silent) and another at unmethylated CpGs (found in CpG islands and active promoters which are transcriptionally active)
CpG islands have very little modification (mostly unmethylated regulatory hubs) and are located in the TSS and active enhancers, whereas the most of the rest of the genome is methylated
Inheritance of DNA methylation
Due to semi conservative replication the newly synthesised strand is not methylated (hemimethylation)
DNMT1 restores methylation
PCNA can interact with DNMT1 to restore methylation patterns


DNMT1 has a CXXC domain which binds to unmethylated CpG with high affinity which positions catalytic TRD domain away from cytosine, preventing it from depositing methylation
BAH2 loop restrains TRD in retracted position
Autoinhibitory linker occludes catalytic site
Protects unmodified CpGs from DNMT1
Demethylation by TET enzymes
5mC is oxidised to 5hmC and further to 5fC and to 5caC
Passive (replication-dependent) loss of methylation (dominant)
Oxidised cytosine is lost due to semi conservative replication as hemioxidised CpG is a poor substrate for DNMT1

BER mediated demethylation
TDG removed caC, forming an abasic site, cleaved by APE1 and PARP1, initiating short patch or long patch repair

Readers of CpGs
5mC has high affinity to proteins with methyl CpG binding domains and zinc finger domains
Unmethylated CpGs are recognised with proteins with CXXC domain, including DNMT1, other chromatin modifiers, and TET enzymes which hydroxylate 5mC
5mC in promoters in promoters inhibit transcription and silence gene expression
CpG islands at TSS are usually maintained in unmethylated state
CpG island methylated phenotype (CIMP) in cancer
Testis expressed genes not expressed in any other cells but sometimes expressed in tumour cells due to DNA methylation remodelling
Imprinted genes - ~200 genes that remember whether they have been inherited from mother or father due to methylation mark left by parental germline during gametogenesis
Imprinted alleles are monoallelically expressed, one is methylated/silenced and the other is unmethylated/active
Genes on inactive X chromosome - in females one of the two X chromosomes is silenced
In embryonic development one X chromosome is randomly selected to be inactivated
XIST gene of that chromosome is activated and produces lncRNA that coats the chromosome in cis. recruited silencing machinery
DNA methylation of promoter CpG islands acts as an epigenetic lock to maintain silencing through cell divisions
Retrotransposons (interspersed repetitive elements) are silenced by methylation
Genetic disorders
Germline - mutation in DNMT3B in immunodeficiency-centromeric instability facial anomalies syndrome 1 (ICF1)
Somatic - mutation in DNMT3A in AML
Germline mutation in MECP2 (methylation reader) - Rett syndrome
Common alterations in cancer
Hypomethylation of centromeric sequences
Genomic instability due to hypomethylation
CpG island hypermethylation (CIMP)
Loss of 5hmC
DNA damage

Damage product can be repaired into the native state or replicated into a mutation which are hard to later repair as it is difficult to discern the context
Mutational signatures - documented mutations are being mathematically organised into signatures
Mutational signatures of mutagens
Rogue activity of DNMTs or TETs
In nematodes AlkB2/3 and AlkB5 are always present if their genome contains methylation
Sometimes instead of methylating the 5’ position DNMTs methylate the wrong group (eg 3meC) which can affect base pairing and cause mutations
AlkB family of dioxygenases are repair enzymes that can remove the undesired methyl group

5mC involved in cyclobutane pyrimidine dimer (CPD) formation after UV
Can be repaired by global genome NER (GG-NER) or transcription coupled NER (TC-NER)
Investigating mutations
7 out of 8 most commonly mutated residues in TP53 are C>T at commonly methylated CpG sites
43% germline mutations in MECP2 are C>T at commonly methylated CpG sites
~70% of tumours have SBS1 mutational signature
Spontaneous deamination

Cytosine can react with water to be deaminated and converted to uracil by losing an amino group
5mC can be converted into thymine
TDG and MBD4 can use BER to repair T:G mismatches caused by chemical modification

5mC can be converted to T
5mC more mutable than C or 5hmC
Brain develops more 5hmC to protect against mutation
Deamination rate of 5hmC to 5hmU is similar to unmodified C, whereas deamination rate of 5mC is notably higher
Hypermutated cancers have defective DNA pol ε or MMR enzymes
DNA pol ε can replicate DNA using its polymerase domain and also has proofreading activity due to exonuclease domain
Mutations in exonuclease domain can increase mutations when replicating the genome
This has shown to increase mutability of CpGs, predominantly in the leading strand because the lagging strand synthesis uses Pol δ which has its own proofreading domain
Hypotheses explaining these findings:
Pol ε and MMR are involved in non-canonical repair of mismatches generated by spontaneous deamination
Deamination is too slow to explain the observed mutation rate
Pol ε and MMR mutations produce gain of function mutator phenotype - the defective enzymes may introduce errors or misprocess damaged bases, becoming a source of mutations rather than just failing to fix them
Shared mutation patterns across cancers - plausible
Pol ε has low fidelity when incorporating nucleotides opposite 5mC
CpG mutation enrichment and leading strand asymmetry makes this most consistent with the data
If mismatches escape proofreading activity of DNA pol, MMR can be activated

MutS homologues detect mismatch and form a tertiary complex with MutL homologues and the mismatch
PMS2 is a mismatch-activates strand-specific nuclease which creates a break in the strand containing a nick provided
The nick is either the discontinuously synthesised Okazaki fragments which acts as the signal for the newly synthesised strand on the lagging strand, or it is created by PCNA which directs PMS2 using its endonuclease activity to cut the strand being synthesised
The nick could also be from the ribonucleotide excision repair system from RNase H2 if the mismatched nucleotide is a ribonucleotide
EXO1 excises the mismatch and the gap is filled by PCNA, Pol δ and DNA ligase
Measuring mutations
Purified Pol E complex (WT or mut) incubated with methylated or nonmethylated DNA
Polymerase error sequencing (PER-seq)
Single molecule barcoding - 19N barcode for each of the original molecules
Linear amplification of original molecule - propagates any true replication errors introduced by Pol ε when incubated with original template but new mutations introduced by Pol ε during amplification are not replicated
In linear amplification each original DNA molecule acts as a template for only 1 strand per replication cycle and the new strand does not become a template for further amplification
Reads sharing the same barcode are aligned and compared to identify true mismatches introduced by Pol ε in the incubation
Allows mismatches to be quantified - separate sequencing of top and bottom strands allows subtraction of background errors produced in template
Observed that mutated Pol ε does produce higher CpG mutations and if there are methylated CpGs in the parent strand then more mutations are generated
If there is methylation on other positions there are less mutations, suggesting 5mC is particularly associated with low fidelity of Pol ε