Lecture 2: ChIP-seq & Other Assays ππ¬
Learning Outcomes:
Describe ChIP-seq methodology and compare its data analysis with RNA-seq; interpret ChIP-seq data qualitatively.
Explain qPCR methodology for gene expression detection and quantification as an alternative to NGS.
Interpret basic qPCR data qualitatively.
Recognize "northern blot," "microarray," and "FISH" as legacy gene expression assays.
Describe the forms and utility of reporter genes.
Apply concepts to design functional genomics methods.
Recap: RNA-seq
RNA-seq allows us to assay and analyse transcriptomes (the complete set of RNA transcripts). It reveals which genes are transcribed, their expression levels, and their splicing patterns. However, gene expression is driven by regulatory factors.
Assaying Gene-Regulatory Protein Factors: ChIP-seq
Gene expression is regulated by the binding of Transcription Factor (TF) proteins to cis-regulatory elements (CREs) on DNA, which influences the activity of RNA polymerases. These interactions occur within the context of chromatin (DNA packaged with proteins like nucleosomes).
ChIP-seq (Chromatin Immunoprecipitation with sequencing) is the core technology for assaying where specific DNA-associated proteins (like TFs or modified histones) bind in the genome.
Methodology of ChIP-seq:
Isolate Cells: Start with a population of living cells.
Crosslink DNA and Proteins: Treat cells with a chemical agent (commonly formaldehyde) to create covalent crosslinks between proteins and DNA that they are bound to in situ.
Chromatin Fragmentation: Break the chromatin into smaller fragments, usually by ultrasonic fragmentation (sonication).
Immunoprecipitation (ChIP):
Use an antibody that specifically recognizes the protein of interest (e.g., a specific TF). These antibodies can be generated by immunizing an animal (like a mouse) with the cloned/purified TF. Monoclonal antibodies are often used.
The antibody is typically attached to magnetic beads.
The antibody-bead complex is used to "pull down" (immunoprecipitate) the target protein along with any crosslinked DNA fragments. Unbound chromatin is washed away.
Reverse Crosslinks and Purify DNA: Reverse the crosslinks to release the DNA from the protein and purify the DNA fragments.
Library Preparation and Sequencing: Prepare a DNA library from the purified fragments and sequence them using Next-Generation Sequencing (NGS).
Data Analysis: Align the sequence reads to a reference genome and plot their frequency distribution or count them.
Interpreting ChIP-seq Data:
Peaks in read frequency indicate regions in the genome where the protein of interest was bound. The data often appears "spiky," similar to RNA-seq data but for different reasons (here, it reflects specific binding sites rather than exons).
Control Experiments: ChIP-seq data is always analyzed in comparison to a control to ensure specificity. The control can be:
An experiment performed identically but without the primary antibody (a "mock IP" or "no-antibody" control).
The input DNA (chromatin fragmented before immunoprecipitation) to account for biases in fragmentation or sequencing.
Bioinformatic software like MACS2 is used for peak calling and analysis.
Data Visualization and Analysis Types:
Viewing Raw Read Frequency: Plotting the frequency of aligned reads across a genomic region can directly show protein binding sites. This is useful for mapping regulatory elements, as demonstrated by ENCODE project data showing binding of TFs (like JUND, STAT2, YY1) and locations of chromatin modifications (like H3K4Me3, H3K27Ac) near genes.
Cumulative/Mean Frequency Plots: To assess general binding behavior, data from many binding sites can be aligned (e.g., centered on the peak summit) and the read frequencies averaged or summed. This creates a cumulative frequency distribution (CFD) that shows the typical binding profile around a feature of interest (e.g., all binding sites for a TF, or all DNA replication origins).
Sorted Heatmaps: Individual binding site profiles (windows of read data around each peak) can be sorted (e.g., by binding intensity or width of the peak) and plotted as a heatmap. This allows visualization of the range of binding patterns across thousands of sites and comparison of these patterns between different experimental conditions (e.g., showing how SATB1 protein binding changes).
ChIP-seq is a versatile functional genomics assay. Many other NGS-based methods exist, such as DNase-seq, ATAC-seq (for accessible chromatin), Hi-C (for 3D genome structure), and more advanced ChIP-based techniques like Cut&Run or ChIP-exo-seq.
Alternative/Legacy Assays for Gene Expression (Non-NGS)
While NGS-based methods provide genome-wide data, simpler methods exist for assaying gene expression at specific regions, often without extensive bioinformatics.
Quantitative/Real-Time PCR (qPCR/RT-PCR): π‘
Purpose: Measures the quantity of specific DNA (or cDNA derived from RNA) molecules.
Methodology:
For RNA analysis: First, convert RNA to cDNA using reverse transcriptase.
Design gene-specific primers that flank the region of interest.
Perform PCR in a qPCR thermocycler, which measures the accumulation of PCR product in real-time during each cycle. This is typically done by detecting the fluorescence of a dye (e.g., SYBR Green) that binds to double-stranded DNA or by using fluorescent probes.
Data Interpretation:
The PCR product accumulates exponentially.
The more template molecules present initially, the earlier the fluorescent signal will cross a defined threshold during the PCR cycles (lower Ct value).
This allows for quantification of the initial amount of target sequence.
Accurate qPCR requires multiple experimental replicates.
Hybridization-Based Methods (Legacy Techniques): π£ These methods rely on the principle of nucleic acid hybridization: the association of two complementary single strands of nucleic acid to form a duplex. A labeled probe (a known nucleic acid sequence) is used to detect a target sequence. The amount of signal from the label is proportional to the amount of probe hybridized to the target.
Northern Blots:
Purpose: To detect and quantify specific mRNA molecules.
Method:
Extract RNA from cells.
Separate RNA by size using denaturing gel electrophoresis.
Transfer (blot) the size-fractionated RNA from the gel to a solid membrane (substrate).
Hybridize the membrane with a labeled nucleic acid probe (e.g., radiolabelled DNA or RNA) complementary to the mRNA of interest.
Wash to remove unbound probe.
Detect the signal (e.g., by autoradiography for radiolabelled probes). The intensity of the band is proportional to the amount of the target mRNA.
Often, a probe for a "housekeeping gene" (e.g., ACT1) with stable expression is used as a loading control. Can show changes in gene expression, e.g., heat-shock response of HSP82.
Microarrays:
Purpose: To measure the expression levels of thousands of genes simultaneously.
Method:
Known DNA sequences (TARGETS), each complementary to a specific gene's mRNA, are spotted at high density onto a solid substrate (e.g., a glass slide). Each spot contains many copies of DNA for one gene.
RNA is extracted from cells, converted to labeled cDNA (PROBE).
The labeled cDNA probe is hybridized to the microarray. The amount of probe binding to each spot is proportional to the expression level of that gene in the sample.
Two-color Microarrays: For differential expression analysis, cDNA from two samples (e.g., control and treatment) are labeled with different fluorescent dyes (e.g., green for control, red for treatment). The mixed probes are hybridized to the same array.
Red signal: Gene up-regulated in treatment.
Green signal: Gene down-regulated in treatment (up-regulated in control).
Yellow signal (red + green): Equal expression in both.
No signal: No expression in either.
Data can be analyzed similarly to RNA-seq (e.g., X/Y scatter plots). Microarrays were a pioneering functional genomics technology but suffered from reproducibility issues.
Fluorescence in situ Hybridization (FISH): πΊ
Purpose: To visualize the location and abundance of specific mRNA molecules within cells or tissues.
Method:
Cells or tissues (SUBSTRATE) are fixed to preserve their structure and RNA.
A labeled nucleic acid PROBE (often fluorescently labeled) complementary to the mRNA of interest is hybridized directly to the fixed sample.
The signal from the probe is visualized using microscopy, revealing the cellular or tissue-specific expression pattern of the target gene.
Examples include visualizing Otx1 mRNA in embryonic mouse brain or eve mRNA in Drosophila embryos.
Reporter Gene Assays: π¦
Purpose: To study gene regulation by measuring the activity of regulatory DNA sequences (e.g., promoters, enhancers).
Method: A reporter gene, which encodes an easily detectable protein or produces a measurable signal, is placed under the control of the experimental gene-regulatory DNA sequence in a recombinant DNA construct (usually a plasmid).
Types of Reporter Genes/Assays:
In vitro reporters: DNA constructs designed to work in a test tube (cell-free systems) or after transfection into cultured cells.
Example: Firefly Luciferase. The luciferase gene is cloned downstream of the test regulatory sequence. When expressed, luciferase enzyme produces light in the presence of its substrate luciferin, and the light output is proportional to the activity of the regulatory sequence. This can be used to dissect CREs by making deletions and observing effects on reporter activity.
In vivo reporters: DNA constructs (plasmids or integrated into the genome) designed to work within a host cell population or whole organism.
Example: E. coli LacZ gene (encodes Ξ²-galactosidase). When the test regulatory sequence is active, LacZ is expressed. In the presence of a substrate like X-gal, Ξ²-galactosidase produces a blue color, allowing for visual identification of cells/tissues where the regulatory element is active.
Summary
Gene expression can be assayed through numerous methods. RNA-seq and ChIP-seq are powerful NGS-based technologies providing genome-wide insights into transcriptomes and protein-DNA interactions, respectively. qPCR offers a targeted approach for quantifying specific nucleic acid sequences. Legacy methods like Northern blots, microarrays, and FISH provided foundational tools for studying mRNA levels and localization. Reporter gene assays remain invaluable for dissecting the function of regulatory DNA elements.