BIO 4150 – History of Approaches and Methods in Human Genetics (Study Notes)

Epigenetics wrap up and chromatin states

Epigenetics studies heritable changes in gene function that do not involve changes to the DNA sequence.
H3K9ac (histone H3 lysine 9 acetylation) marks chromatin that is active, typically at promoters, and facilitates transcriptional initiation and transition to elongation.
H3K27me3 (histone H3 lysine 27 trimethylation) is a methylation mark associated with repressed chromatin.
Chromatin states are characterized by distinct combinations of histone marks (e.g., ON-H3K9ac vs OFF-H3K27me3) and correlate with transcriptional activity.
Constitutive vs Facultative chromatin:
- Constitutive chromatin generally carries marks that maintain a given state (e.g., permanently off or on).
- Facultative chromatin carries marks that can change depending on context (development, environment).
Open (euchromatin) vs Closed (heterochromatin) states are governed by the balance of methylation and acetylation marks and other modifications, which control accessibility for transcription.
Methylation and acetylation marks are involved in turning genes on and off; different types of methylation can lead to repression or activation depending on position and mark.
Overall theme: chromatin state is dynamic and marks influence access to DNA for transcription machinery.

Epigenetics in Normal Regulation

DNA accessibility and transcriptional activation follow a regulated sequence:
- Relaxed (open) DNA state allows transcription factor (TF) access.
- Activator binding to regulatory regions.
- RNA Polymerase II (RNAP II) recruitment to promoters.
- Transcription initiation, followed by promoter clearance and transition to elongation.
- Pre-mRNA processing (5′ capping, splicing, 3′ polyadenylation) -> processed mRNA.
- mRNA stability and translation into protein.
DNase I hypersensitivity is detected after nucleosome displacement, indicating open chromatin regions accessible to TFs and RNAP II.
The transcription cycle involves RNA pol II and transcription factors binding promoter regions to initiate transcription.

Transcription factors (TFs) and chromatin regulation

Transcription factors can influence chromatin state and epigenetic regulation:
- A single protein binding at a site can alter the local chromatin landscape to be more open, enabling transcription.
- TFs can recruit proteins that modify histone marks, changing the regulatory environment.
TFs recruit chromatin-modifying proteins that alter histone marks and chromatin accessibility, contributing to cell-type–specific gene expression and cellular memory.

TFs recruiting chromatin-modifying machinery

TFs can recruit proteins that alter the histone code:
- Histone acetyltransferases (writers) add acetyl groups to histones (e.g., H3K9ac).
- Histone deacetylases (erasers) remove acetyl groups.
- Histone methyltransferases (writers) add methyl groups.
- Histone demethylases (erasers) remove methyl groups.
The recruitment of writers and erasers by trans-acting TFs establishes chromatin states across the genome.

TFs and chromatin readers

Histone marks are read by chromatin readers that interpret the histone code and influence whether chromatin is open or closed.
This reading and writing/erasing of marks create a cellular memory from the progenitor cell to its descendants (somatic and germ cells).
The genome-wide landscape of chromatin marks and their readers/erasers establishes functional regions and lineage-specific programs.

How do we find ‘genes’ and measure their activity?

Gene expression profiling: RNA-seq.
- RNA-seq assays gene expression across the genome by sequencing cDNA derived from mRNA.
- Process: reverse transcription of mRNA to cDNA; sequencing reads map to transcripts; the depth of reads (read abundance) correlates with expression level.
- Goal: obtain a quantitative measure of gene expression for all genes.
Practical notes:
- cDNA libraries are constructed from RNA; sequencing reads reflect transcript abundance.
- Higher stack depth → more accurate expression estimate for low-abundance transcripts.

How do we identify regulatory elements?

Open chromatin assays identify regulatory elements where TFs can bind.
ATAC-seq (Assay for Transposase-Accessible Chromatin using sequencing):
- Uses the TN5 transposase to tag and fragment open chromatin regions.
- Antibody-free method; open regions are preferentially tagged, while closed regions are not accessible to the enzyme.
ATAC-seq highlights active regulatory regions and enhancers by mapping accessible chromatin.
Additional approaches for regulatory element discovery include:
- DNase-seq (DNase I hypersensitive sites sequencing): identifies DNase I hypersensitive sites indicating open chromatin.
- FAIRE-seq (Formaldehyde-Assisted Isolation of Regulatory Elements): enriches open chromatin regions via DNA accessibility.
- MNase-seq (Micrococcal Nuclease sequencing) and MAINE-seq: map nucleosome positions and turnover.
- ORGANIC, CATCH-IT: methods for identifying chromatin turnover and histone dynamics.
- ChIP-seq (Chromatin Immunoprecipitation sequencing): maps DNA regions bound by specific proteins (e.g., histone marks or TFs).
- ChIP-exo and HT-ChIP: higher-resolution or high-throughput variants of ChIP-seq.
- Histone methylation ChIP-seq: profiles methylation marks across the genome.
- Chem-seq: identifies sites bound by small molecules/drugs within chromatin.
- NOMe-seq (Nucleosome Occupancy and Methylome sequencing): single-molecule open DNA assay.
- 3C-based methods: 3C, 4C, 5C, Hi-C, Capture-C, ChIA-PET for 3D genome interaction mapping.
- HITS-FLIP: high-throughput ligand interaction profiling to identify protein-DNA binding landscapes.
- PB-seq and related approaches map binding energy landscapes and protein-DNA interactions.
Goal: assemble a comprehensive map of regulatory elements and their interactions with factors across the genome.

RNA polymerase II and the transcription machinery

RNA polymerase II does the heavy lifting in transcription.
Transcription factors and RNAP II coordinate during recognition and initiation of transcription.
Key components include:
- General transcription factors (e.g., TFIID/TBP, etc.)
- A large basal transcription apparatus with many subunits (almost 60 subunits in total)
- Regulatory region-bound transcription factors that recruit the mediator complex and RNAP II to promoter regions.
Enhancers interact with promoters via long-range DNA looping, mediated by architectural proteins (e.g., cohesin, CTCF) and Mediator to drive transcription.

3D structure and organization of DNA

DNA is organized in 3D space within the nucleus.
Enhancers and promoters interact through DNA looping to facilitate transcriptional regulation.
Key players:
- Transcriptional activators and coactivators (Mediator complex) and RNA polymerase II.
- Architectural proteins that shape 3D genome topology.
- Nuclear envelope and lamina contribute to chromosome organization (e.g., transcriptionally repressive vs active compartments).
- TATA box and other promoter elements are part of the basal transcription machinery.
Hi-C maps reveal chromosome territories and topologically associating domains (TADs).

3D data approaches for regulatory element discovery

ChIA-PET (Chromatin Interaction Analysis by Paired-End Tag Sequencing): maps chromatin interactions mediated by a protein of interest (e.g., a TF).
Hi-C, 3C, Capture-C, 4C, 5C: progressively focused methods to study chromatin conformation.
PB-seq, HITS-FLIP: map protein-DNA interactions and binding energy landscapes.
Ligation-based, circularization, and microarray strategies allow the identification of interacting regions and regulatory networks in 3D space.
The goal is to link regulatory elements with their target genes through physical contacts in the 3D genome.

What is our genome composed of?

Genome composition highlights (approximate proportions):
- Transposable elements (TES): ~45%
- SINES (short interspersed nuclear elements), including Alu elements: ~13%
- LINES (long interspersed nuclear elements): ~20%
- LTR retrotransposons: ~8%
- DNA transposons: ~3%
- Introns: ~26%
- Protein-coding genes: ~1.5%
- Simple sequence repeats: ~3%
- Segmental duplications: ~5%
- Miscellaneous unique sequences: ~11%
- Miscellaneous heterochromatin: ~8%
Note: Some values reflect the breakdown presented in the source slide set; approximations are used where numbers are approximate.

What are our genes doing? Gene annotation and function categories

Gene annotation categorizes genes by molecular function and roles in biology:
1. Miscellaneous
2. Viral protein
3. Transfer or carrier protein
4. Transcription factor
5. Nucleic acid enzyme
6. Signaling molecule
7. Receptor
8. Kinase
9. Select regulatory molecule
10. Transferase
11. Synthase and synthetase
12. Oxidoreductase
13. Lyase
14. Ligase
15. Isomerase
16. Hydrolase
17. Molecular function unknown
18. Transporter
19. Intracellular transporter
20. Calcium-binding protein
21. Proto-oncogene
22. Structural protein of muscle
23. Motor
24. Ion channel
25. Immunoglobulin
26. Extracellular matrix
27. Cytoskeletal structural protein
28. Chaperone
29. Cell adhesion
These categories summarize diverse molecular functions and cellular roles annotated for genes.

Disease-associated alleles and GWAS catalogs

Disease-associated alleles are cataloged in genome-wide association studies (GWAS).
NHGRI-EBI GWAS Catalog (as of May 2018) collects published associations between genetic variants and traits.
Significance threshold used: $p \,\le\, 5\times 10^{-8}$ for 17 trait categories.
Trait categories include:
- Digestive system disease
- Cardiovascular disease
- Metabolic disease
- Immune system disease
- Nervous system disease
- Liver enzyme measurement
- Lipid or lipoprotein measurement
- Inflammatory marker measurement
- Hematological measurement
- Body measurement
- Cardiovascular measurement
- Other measurement
- Response to drug
- Cancer
- Other disease
- Other trait
This catalog provides a resource for linking genetic variants to disease risk and traits.

History of Genetics

The slide titled "History of Genetics" signals a shift to historical context, indicating the evolution of genetic thought and methods over time.

Evolutionary thought after Darwin

By the 1870s, most scientists accepted evolution as a historical reality, but natural selection took longer to be widely accepted (roughly 60 years after Darwin’s Origin of Species).
There was discomfort with natural selection because people desired life to be purposeful and creative; thus, alternative theories persisted.
Neo-Lamarckism argued inheritance of acquired characteristics.
Mutationism posited that discrete variations were the primary driver of evolution.

Mutationism and the impact of Mendel

Mendel’s work (1866) on discrete variation was published but largely ignored until around 1900.
Darwin did not know the mechanism of inheritance when formulating natural selection.
Mutationist theories emphasized discrete variation and were advanced by figures like T. H. Morgan (Drosophila genetics) and R. Goldschmidt (hopeful monsters).
Mendelian genetics eventually helped disprove Lamarckian inheritance; the debate between mutationism and natural selection contributed to the historical rift, which was later reconciled.

The modern synthesis of evolutionary biology

The rift between genetics and natural selection was resolved in the 1930s and 1940s.
The Modern Synthesis integrated genetics, systematics, and paleontology to form a cohesive theory of evolution.
This synthesis reconciled mutation, natural selection, genetic drift, speciation, and inheritance under a unified framework.