BIO 4150 – History of Approaches and Methods in Human Genetics (Study Notes)
Epigenetics wrap up and chromatin states
- Epigenetics studies heritable changes in gene function that do not involve changes to the DNA sequence.
- H3K9ac (histone H3 lysine 9 acetylation) marks chromatin that is active, typically at promoters, and facilitates transcriptional initiation and transition to elongation.
- H3K27me3 (histone H3 lysine 27 trimethylation) is a methylation mark associated with repressed chromatin.
- Chromatin states are characterized by distinct combinations of histone marks (e.g., ON-H3K9ac vs OFF-H3K27me3) and correlate with transcriptional activity.
- Constitutive vs Facultative chromatin:
- Constitutive chromatin generally carries marks that maintain a given state (e.g., permanently off or on).
- Facultative chromatin carries marks that can change depending on context (development, environment).
- Open (euchromatin) vs Closed (heterochromatin) states are governed by the balance of methylation and acetylation marks and other modifications, which control accessibility for transcription.
- Methylation and acetylation marks are involved in turning genes on and off; different types of methylation can lead to repression or activation depending on position and mark.
- Overall theme: chromatin state is dynamic and marks influence access to DNA for transcription machinery.
Epigenetics in Normal Regulation
- DNA accessibility and transcriptional activation follow a regulated sequence:
- Relaxed (open) DNA state allows transcription factor (TF) access.
- Activator binding to regulatory regions.
- RNA Polymerase II (RNAP II) recruitment to promoters.
- Transcription initiation, followed by promoter clearance and transition to elongation.
- Pre-mRNA processing (5′ capping, splicing, 3′ polyadenylation) -> processed mRNA.
- mRNA stability and translation into protein.
- DNase I hypersensitivity is detected after nucleosome displacement, indicating open chromatin regions accessible to TFs and RNAP II.
- The transcription cycle involves RNA pol II and transcription factors binding promoter regions to initiate transcription.
Transcription factors (TFs) and chromatin regulation
- Transcription factors can influence chromatin state and epigenetic regulation:
- A single protein binding at a site can alter the local chromatin landscape to be more open, enabling transcription.
- TFs can recruit proteins that modify histone marks, changing the regulatory environment.
- TFs recruit chromatin-modifying proteins that alter histone marks and chromatin accessibility, contributing to cell-type–specific gene expression and cellular memory.
TFs recruiting chromatin-modifying machinery
- TFs can recruit proteins that alter the histone code:
- Histone acetyltransferases (writers) add acetyl groups to histones (e.g., H3K9ac).
- Histone deacetylases (erasers) remove acetyl groups.
- Histone methyltransferases (writers) add methyl groups.
- Histone demethylases (erasers) remove methyl groups.
- The recruitment of writers and erasers by trans-acting TFs establishes chromatin states across the genome.
TFs and chromatin readers
- Histone marks are read by chromatin readers that interpret the histone code and influence whether chromatin is open or closed.
- This reading and writing/erasing of marks create a cellular memory from the progenitor cell to its descendants (somatic and germ cells).
- The genome-wide landscape of chromatin marks and their readers/erasers establishes functional regions and lineage-specific programs.
How do we find ‘genes’ and measure their activity?
- Gene expression profiling: RNA-seq.
- RNA-seq assays gene expression across the genome by sequencing cDNA derived from mRNA.
- Process: reverse transcription of mRNA to cDNA; sequencing reads map to transcripts; the depth of reads (read abundance) correlates with expression level.
- Goal: obtain a quantitative measure of gene expression for all genes.
- Practical notes:
- cDNA libraries are constructed from RNA; sequencing reads reflect transcript abundance.
- Higher stack depth → more accurate expression estimate for low-abundance transcripts.
How do we identify regulatory elements?
- Open chromatin assays identify regulatory elements where TFs can bind.
- ATAC-seq (Assay for Transposase-Accessible Chromatin using sequencing):
- Uses the TN5 transposase to tag and fragment open chromatin regions.
- Antibody-free method; open regions are preferentially tagged, while closed regions are not accessible to the enzyme.
- ATAC-seq highlights active regulatory regions and enhancers by mapping accessible chromatin.
- Additional approaches for regulatory element discovery include:
- DNase-seq (DNase I hypersensitive sites sequencing): identifies DNase I hypersensitive sites indicating open chromatin.
- FAIRE-seq (Formaldehyde-Assisted Isolation of Regulatory Elements): enriches open chromatin regions via DNA accessibility.
- MNase-seq (Micrococcal Nuclease sequencing) and MAINE-seq: map nucleosome positions and turnover.
- ORGANIC, CATCH-IT: methods for identifying chromatin turnover and histone dynamics.
- ChIP-seq (Chromatin Immunoprecipitation sequencing): maps DNA regions bound by specific proteins (e.g., histone marks or TFs).
- ChIP-exo and HT-ChIP: higher-resolution or high-throughput variants of ChIP-seq.
- Histone methylation ChIP-seq: profiles methylation marks across the genome.
- Chem-seq: identifies sites bound by small molecules/drugs within chromatin.
- NOMe-seq (Nucleosome Occupancy and Methylome sequencing): single-molecule open DNA assay.
- 3C-based methods: 3C, 4C, 5C, Hi-C, Capture-C, ChIA-PET for 3D genome interaction mapping.
- HITS-FLIP: high-throughput ligand interaction profiling to identify protein-DNA binding landscapes.
- PB-seq and related approaches map binding energy landscapes and protein-DNA interactions.
- Goal: assemble a comprehensive map of regulatory elements and their interactions with factors across the genome.
RNA polymerase II and the transcription machinery
- RNA polymerase II does the heavy lifting in transcription.
- Transcription factors and RNAP II coordinate during recognition and initiation of transcription.
- Key components include:
- General transcription factors (e.g., TFIID/TBP, etc.)
- A large basal transcription apparatus with many subunits (almost 60 subunits in total)
- Regulatory region-bound transcription factors that recruit the mediator complex and RNAP II to promoter regions.
- Enhancers interact with promoters via long-range DNA looping, mediated by architectural proteins (e.g., cohesin, CTCF) and Mediator to drive transcription.
3D structure and organization of DNA
- DNA is organized in 3D space within the nucleus.
- Enhancers and promoters interact through DNA looping to facilitate transcriptional regulation.
- Key players:
- Transcriptional activators and coactivators (Mediator complex) and RNA polymerase II.
- Architectural proteins that shape 3D genome topology.
- Nuclear envelope and lamina contribute to chromosome organization (e.g., transcriptionally repressive vs active compartments).
- TATA box and other promoter elements are part of the basal transcription machinery.
- Hi-C maps reveal chromosome territories and topologically associating domains (TADs).
3D data approaches for regulatory element discovery
- ChIA-PET (Chromatin Interaction Analysis by Paired-End Tag Sequencing): maps chromatin interactions mediated by a protein of interest (e.g., a TF).
- Hi-C, 3C, Capture-C, 4C, 5C: progressively focused methods to study chromatin conformation.
- PB-seq, HITS-FLIP: map protein-DNA interactions and binding energy landscapes.
- Ligation-based, circularization, and microarray strategies allow the identification of interacting regions and regulatory networks in 3D space.
- The goal is to link regulatory elements with their target genes through physical contacts in the 3D genome.
What is our genome composed of?
- Genome composition highlights (approximate proportions):
- Transposable elements (TES): ~45%
- SINES (short interspersed nuclear elements), including Alu elements: ~13%
- LINES (long interspersed nuclear elements): ~20%
- LTR retrotransposons: ~8%
- DNA transposons: ~3%
- Introns: ~26%
- Protein-coding genes: ~1.5%
- Simple sequence repeats: ~3%
- Segmental duplications: ~5%
- Miscellaneous unique sequences: ~11%
- Miscellaneous heterochromatin: ~8%
- Note: Some values reflect the breakdown presented in the source slide set; approximations are used where numbers are approximate.
What are our genes doing? Gene annotation and function categories
- Gene annotation categorizes genes by molecular function and roles in biology:
- Miscellaneous
- Viral protein
- Transfer or carrier protein
- Transcription factor
- Nucleic acid enzyme
- Signaling molecule
- Receptor
- Kinase
- Select regulatory molecule
- Transferase
- Synthase and synthetase
- Oxidoreductase
- Lyase
- Ligase
- Isomerase
- Hydrolase
- Molecular function unknown
- Transporter
- Intracellular transporter
- Calcium-binding protein
- Proto-oncogene
- Structural protein of muscle
- Motor
- Ion channel
- Immunoglobulin
- Extracellular matrix
- Cytoskeletal structural protein
- Chaperone
- Cell adhesion
- These categories summarize diverse molecular functions and cellular roles annotated for genes.
Disease-associated alleles and GWAS catalogs
- Disease-associated alleles are cataloged in genome-wide association studies (GWAS).
- NHGRI-EBI GWAS Catalog (as of May 2018) collects published associations between genetic variants and traits.
- Significance threshold used: p \,\le\, 5\times 10^{-8} for 17 trait categories.
- Trait categories include:
- Digestive system disease
- Cardiovascular disease
- Metabolic disease
- Immune system disease
- Nervous system disease
- Liver enzyme measurement
- Lipid or lipoprotein measurement
- Inflammatory marker measurement
- Hematological measurement
- Body measurement
- Cardiovascular measurement
- Other measurement
- Response to drug
- Cancer
- Other disease
- Other trait
- This catalog provides a resource for linking genetic variants to disease risk and traits.
History of Genetics
- The slide titled "History of Genetics" signals a shift to historical context, indicating the evolution of genetic thought and methods over time.
Evolutionary thought after Darwin
- By the 1870s, most scientists accepted evolution as a historical reality, but natural selection took longer to be widely accepted (roughly 60 years after Darwin’s Origin of Species).
- There was discomfort with natural selection because people desired life to be purposeful and creative; thus, alternative theories persisted.
- Neo-Lamarckism argued inheritance of acquired characteristics.
- Mutationism posited that discrete variations were the primary driver of evolution.
Mutationism and the impact of Mendel
- Mendel’s work (1866) on discrete variation was published but largely ignored until around 1900.
- Darwin did not know the mechanism of inheritance when formulating natural selection.
- Mutationist theories emphasized discrete variation and were advanced by figures like T. H. Morgan (Drosophila genetics) and R. Goldschmidt (hopeful monsters).
- Mendelian genetics eventually helped disprove Lamarckian inheritance; the debate between mutationism and natural selection contributed to the historical rift, which was later reconciled.
The modern synthesis of evolutionary biology
- The rift between genetics and natural selection was resolved in the 1930s and 1940s.
- The Modern Synthesis integrated genetics, systematics, and paleontology to form a cohesive theory of evolution.
- This synthesis reconciled mutation, natural selection, genetic drift, speciation, and inheritance under a unified framework.