Comprehensive Notes: Gene Sequencing, NGS, and Quantitative PCR (qPCR)
Overview and Context
- Gene sequencing is part of a broader human effort to understand ourselves and disease, including cancer. It complements other big projects like the Apollo program (landing on the Moon) and the development of nuclear weapons; the key aim is to understand human biology to treat diseases such as cancer.
- The idea: sequence every gene, map the genome, understand structure, function, evolution, and editing, using high-throughput sequencing and bioinformatics to assemble the human genome.
- Historical milestones:
- Human Genome Project (HGP): started in the 1990s, completed in about 13 years; NIH led the effort; sequenced reference human genome and created a shared database for researchers.
- 1000 Genomes Project: later expansion to a much larger population to capture genetic diversity and rare variants.
- Interpretation: on average, a person carries roughly
N∼2.5× 102 to 3times102
mutations relative to the reference genome, most benign but some contribute to disease.
- Core genetic concepts (brief revision):
- DNA stores genetic information; nucleotides: adenine (A), cytosine (C), guanine (G), thymine (T).
- RNA is usually single-stranded and uses uracil (U) instead of thymine; base set: A, C, G, U.
- Genes are nucleic acid sequences located on chromosomes within the nucleus.
- mRNA (messenger RNA) is a single-stranded copy of a gene; introns are non-coding regions in genomic DNA; exons encode proteins and are retained in mature mRNA after splicing.
- RNA processing: transcription from DNA produces pre-mRNA; introns are spliced out to produce mature mRNA, which exits the nucleus and is translated into protein.
- Non-coding RNAs: microRNAs (miRNAs) are small RNAs that regulate translation; long non-coding RNAs (lncRNAs) regulate gene expression and have functions beyond “junk” RNA.
- Tissue-specific transcripts (SIIs/PIIs) reflect tissue-specific expression patterns.
- Gene sequencing vs expression profiling: sequencing maps genome structure and potential variants; it is complemented by expression profiling to understand which genes are active under different conditions.
- Practical framing: sequencing uses high-throughput approaches plus bioinformatics to reconstruct genomes; PCR is a foundational tool used before sequencing to amplify DNA fragments for analysis.
DNA and RNA: Quick Revision
- DNA basics
- DNA carries genetic information.
- Building blocks: nucleotides consist of a phosphate group, a sugar (deoxyribose), and a nitrogenous base.
- Four DNA bases: A,C,G,T.
- RNA basics
- RNA uses A,C,G,U (instead of T).
- Messenger RNA (mRNA) is a single-stranded copy of a gene; introns are removed during processing; exons are retained and code for amino acids.
- Central dogma quick recap
- Transcription: DNA -> RNA (pre-mRNA containing introns and exons).
- RNA processing: introns removed; mature mRNA exported to cytoplasm.
- Translation: mRNA -> protein (codons code for amino acids).
- Non-coding RNAs
- microRNA (miRNA): small regulatory RNAs that control protein expression at the translational level.
- long non-coding RNA (lncRNA): regulatory roles in gene expression.
- Genomic organization
- DNA contains exons and introns; genes are arranged along chromosomes.
- Genes exist in multiple copies and can be differentially expressed across tissues.
What is Gene Sequencing?
- Definition: sequencing aims to understand the genomic mapping of the human body, integrating structure, function, evolution, and editing related to the genome.
- Core approach: combine high-throughput sequencing techniques with bioinformatics to reconstruct the genome sequence and identify variants.
- Sequencing history notes:
- From Sanger (first sequencing method) to Next-Generation Sequencing (NGS).
- Sanger/sequencing by synthesis of short DNA fragments, reading a single sequence at a time.
- NGS allows parallel sequencing of many fragments, dramatically increasing throughput.
- Role of PCR in sequencing
- Polymerase Chain Reaction (PCR) amplifies DNA fragments to generate enough material for sequencing or analysis.
- Basic PCR cycle stages: denaturation, annealing, elongation.
- Denaturation: double-stranded DNA is heated to separate strands; typically around 95∘C; melting temperature depends on GC content.
- Annealing: primers bind to complementary sequences; temperature near primer melting temperature (Tm); aim for similar Tm between forward and reverse primers.
- Extension: DNA polymerase extends primers at about 72∘C, using dNTPs to synthesize new strands.
- Each cycle doubles the amount of target DNA: after n cycles, copies ≈ N=N0⋅2n.
- Historical genome projects relevance
- HGP enabled reference genome for comparison with patient samples.
- 1000 Genomes expanded diversity to discover population-specific and rare variants.
- PCR applications beyond sequencing
- Gene expression profiling, genetic fingerprinting, pathogen detection (e.g., SARS-CoV-2 testing via RT-qPCR).
- Modern sequencing landscape
- Over thousands of genomes sequenced and available in public databases (e.g., NCBI).
- Next-generation sequencing produces detailed readouts of DNA sequences from samples.
Next-Generation Sequencing (NGS) vs Microarrays
- Key idea: NGS detects sequences directly; microarrays rely on preset probes to detect known transcripts.
- NGS advantages over microarrays
- Can detect novel transcripts and low-abundance sequences ignored by microarrays.
- Higher sensitivity and accuracy; broader dynamic range.
- No dependence on predefined probes; can map entire transcriptome and genome sequences.
- How NGS works (simplified)
- Fragment genomic DNA into many pieces; add adapters and barcodes to ends of fragments.
- Attach fragments to sequencing platform, perform amplification (cluster generation), and read fluorescence signals to determine base identities.
- Computationally assemble reads into a complete sequence and align to reference genome.
- Practical implication in cancer research
- NGS helps identify mutations, rearrangements, copy number variations, and transcriptome changes that accompany cancer progression.
Single-Cell Sequencing in Cancer
- Rationale
- Tumors are heterogeneous: cancer cells, stromal cells, immune cells, and other cell types coexist in a mass.
- Bulk sequencing averages signals across all cells, masking cell-type-specific changes.
- What single-cell sequencing adds
- Sequencing the RNA from individual cells reveals cell-type-specific gene expression profiles.
- Enables identification of cancer cells, immune cells (e.g., T cells, macrophages), fibroblasts, and other components.
- Cancer development and staging in the single-cell view
- Normal tissue baseline provides a reference.
- Early precancerous cells show mutations that begin malignant transformation.
- Carcinoma in situ remains at the original site; progression leads to invasion and possible metastasis.
- Metastasis involves cancer cells entering blood/lymph systems and colonizing distant organs (e.g., liver, lung, bone, brain).
- Example workflow and insights
- Compare single-cell profiles across stages to identify driver mutations and pathways involved in progression.
- In melanoma, about 50% of cases carry an activating BRAF mutation in a survival pathway; BRAF inhibitors showed high response rates (~80%) in BRAF-mutant melanoma patients.
- Single-cell sequencing of cancer nerve interactions reveals how nerves invade tumors and correlate with cellular stress markers (e.g., ER stress, GRP78).
- A case study sketch from the lecturer’s lab
- Research topic: cancer–nerve interactions in solid tumors (notably pancreatic cancer).
- Observed nerve invasion correlates with GRP78 (an ER stress marker).
- Experiments compared cancer cells grown under normal conditions versus ER-stress conditions, and the effect on neurite outgrowth/nerve invasion was monitored.
- Follow-up single-cell sequencing generated gene lists with significant differential expression under ER stress conditions; shortlisted genes guided further validation.
- Validation via real-time RT-qPCR in multiple cancer cell lines confirmed upregulation of selected targets.
- Data interpretation and validation workflow
- Post-sequencing pipeline: identify significantly changed genes (fold-change, statistical significance).
- Use public resources (e.g., NCBI Gene, GeneCards) to annotate gene function and disease relevance (cancer, nerve interaction, ER stress).
- Shortlist candidate genes (e.g., >2-fold change and significant) for further validation.
- Validation step is essential to confirm sequencing results before drawing conclusions.
- Practical time frame (illustrative)
- From sequencing order to data results: even with outsourcing, analysis and validation can take several weeks to months; a typical total process can span about nine months.
A Clinical and Practical Example: Nerve-Cancer Interactions Case
- Context
- Pancreatic and other solid tumors exhibit nerve invasion; nerve fibers infiltrating tumors contribute to pain and tumor biology.
- Experimental setup
- In vitro: co-cultures of cancer cells with nerve cells under normal vs ER stress conditions.
- Observed neuronal morphology changes under ER stress: neurites elongate and form networks, indicating nerve activity and potential invasion cues.
- In vivo validation
- Tumor-bearing mice show nerve innervation consistent with human samples, supporting a role for nerves in tumor progression.
- Sequencing and downstream analysis
- After ER-stress–induced cultures, client-company sequencing yielded tens of thousands of genes; filtering focused on significant fold-changes.
- Heatmaps (clustered gene expression) helped visualize upregulated vs downregulated genes across conditions.
- Subsequent annotation and literature review identified candidate genes connected to cancer, nerve invasion, and ER stress.
- Selected genes validated by RT-qPCR across multiple cancer cell lines, strengthening confidence in sequencing findings.
- Practical takeaway
- NGS-based discovery must be followed by rigorous validation (e.g., qPCR) and functional studies to confirm roles in cancer biology.
Quantitative PCR (qPCR) vs Traditional PCR
- Core idea of PCR
- PCR amplifies DNA fragments to enable analysis; traditional (endpoint) PCR is often analyzed by gel electrophoresis after amplification.
- qPCR (quantitative PCR) vs traditional PCR
- qPCR monitors amplification in real time using fluorescence; products are quantified as the reaction progresses.
- Traditional PCR results are end-point only (gel band intensity is semi-quantitative at best).
- Why qPCR is preferred for quantitation
- Higher sensitivity and specificity; real-time readout; lower starting quantities suffice; avoids post-PCR processing.
- Two main ways to measure fluorescence in qPCR
- Probe-based (e.g., TaqMan): uses a sequence-specific probe with a reporter dye and a quencher; fluorescence increases as the probe is degraded during amplification.
- Intercalating dye-based (e.g., SYBR Green): dye binds double-stranded DNA; fluorescence increases with the amount of dsDNA formed.
- RT-qPCR: combining reverse transcription with qPCR
- Template can be RNA (usually mRNA) or DNA; when starting from RNA, reverse transcription creates complementary DNA (cDNA).
- RT-qPCR can be done as one-step (one reaction for RT and PCR) or two-step (RT to cDNA, then qPCR on multiple targets).
- Primer design and assay setup for qPCR
- Primer design aims for short amplicons, typically 100–150 bp, to optimize efficiency and accuracy.
- Primer characteristics:
- Amplicon size around 100–150 bp.
- Similar melting temperatures for forward and reverse primers (ideally within 2°C).
- Avoid primer dimers and secondary structures; avoid amplifying genomic DNA introns when targeting cDNA.
- Template sources:
- DNA template (genomic DNA, plasmid DNA) or RNA template with reverse transcription to cDNA.
- RT steps and enzyme choices:
- Reverse transcription uses a reverse transcriptase and RNase inhibitors to convert RNA to cDNA; RT step is typically around 30 minutes; cDNA is more stable than RNA and can be stored.
- One-step vs two-step RT-qPCR: pros and cons
- One-step RT-qPCR: faster, fewer pipetting steps; less hands-on time; convenient for high-throughput assays; less flexibility for multiple targets from the same RNA input.
- Two-step RT-qPCR: greater flexibility, allows using the same cDNA for multiple qPCR assays; often more suitable for gene expression profiling with multiple targets; cDNA can be archived for later use.
- Data interpretation and quantitation methods
- Ct (cycle threshold) value: the cycle number at which fluorescence crosses a predefined threshold; inversely related to the amount of target nucleic acid.
- Absolute quantification (standard curve method): uses a DNA standard with known copy numbers to generate a standard curve; unknown sample copies are read from the curve.
- Relative quantification (comparative Ct or ΔΔCt method): compares target gene expression in treated vs control samples using a reference (housekeeping) gene; fold change is calculated as:
Fold change=2−ΔΔC<em>T
where
ΔC</em>T=C<em>T(target)−C</em>T(reference)
and
ΔΔC<em>T=ΔC</em>Ttreated−ΔCTcontrol
- Role of housekeeping genes and controls
- Use stable reference genes (e.g., GAPDH, ACTB) whose expression does not change across treatments.
- If the CT of the housekeeping gene changes significantly, re-evaluate the reference gene choice.
- Practical considerations and common pitfalls
- Endpoint PCR gels can be qualitative; qPCR provides kinetic data and more robust quantification.
- Poor primer design or degraded templates can yield misleading results; always include controls and replicates.
- Validation is crucial: corroborate sequencing findings with independent methods (e.g., RT-qPCR in multiple cell lines).
- One example exam-style notes discussed in class
- Denaturation, annealing, and extension are the three PCR steps; extension/elongation corresponds to the third step where nucleotides are added.
- The common exam question: identify which step involves the addition of nucleotides to extend the growing DNA strand; answer: extension (elongation).
Primer Design and Practical Guidelines
- For qPCR primers, design constraints are tighter than regular PCR primers.
- Product size: 100–150 bp.
- Forward and reverse primers should have similar melting temperatures, with a maximum difference of about 2°C.
- Avoid amplification of genomic DNA by targeting exon–exon junctions when using cDNA as template.
- Avoid primer dimers, repetitive sequences, and secondary structures.
- Primer design workflow (typical in labs)
- Use online tools (e.g., NCBI Primer-BLAST) to select primer sets with appropriate product size, Tm, GC content, and specificity.
- Examine predicted amplicon location, GC content, and potential off-target amplification.
- Validate primer performance empirically with qPCR and melt-curve analysis if using SYBR Green.
- RT-qPCR data capture and interpretation
- Real-time fluorescence curves indicate amplification progress; a successful reaction shows a rising curve and a defined exponential phase.
- A flat curve indicates a failed reaction (no target amplification).
Practical Takeaways and Real-World Relevance
- When to use what
- Use NGS for comprehensive genome/transcriptome profiling, discovery of novel transcripts, and complex tumor heterogeneity.
- Use microarrays when a fixed, predefined set of transcripts is sufficient (less expensive for some applications), but be aware of lower sensitivity and limited dynamic range compared to NGS.
- Use qPCR for targeted, sensitive, and quantitative validation of specific genes after sequencing or as a diagnostic/monitoring tool (e.g., viral load, gene expression changes).
- Validation and workflow reality
- Sequencing projects often require weeks to months for data processing, filtering, and biological interpretation.
- Validation with RT-qPCR (or other orthogonal methods) is essential to confirm sequencing results before publication or clinical translation.
- Ethical and practical implications
- Genomic data interpretation must consider potential incidental findings and patient privacy.
- Clinical translation requires rigorous validation, reproducibility, and standardization across laboratories.
- Summary of key equations and concepts
- PCR amplification (ideal): N=N0⋅2n
- Absolute quantification (standard curve): Ct=alog10(N)+b⇒N=10(Ct−b)/a
- qPCR relative quantification: ΔC<em>T=C</em>T(target)−C<em>T(reference)ΔΔC</em>T=ΔC<em>Ttreated−ΔC</em>Tcontrol
Fold change=2−ΔΔCT
- Key terminology recap
- RT-qPCR: reverse transcription followed by quantitative PCR.
- One-step RT-qPCR: RT and qPCR in the same reaction.
- Two-step RT-qPCR: first RT to cDNA, then separate qPCR for multiple targets.
- Primer design and product size constraints for qPCR: 100–150 bp, GC balance, minimal primer-dimers.
- NGS vs microarray: NGS detects novel transcripts and low-abundance sequences with higher sensitivity; microarrays rely on predefined probes.
- Single-cell sequencing: enables resolution of cellular heterogeneity in tumors and the study of cancer progression at the cell level.
Final Takeaways for Exam Preparation
- Understand the differences between sequencing technologies and when each is appropriate (NGS vs microarray).
- Be able to describe the steps of PCR, qPCR, and RT-qPCR, and explain why qPCR provides quantitative data.
- Know the two quantitative approaches in qPCR (absolute with a standard curve; relative with ΔΔCt) and how to compute fold changes.
- Recognize the role of housekeeping genes in normalization and the importance of validating reference genes.
- Appreciate the workflow from discovery (NGS) to validation (qPCR) in cancer research, including how single-cell sequencing informs tumor heterogeneity and targeted therapies.
- Be familiar with clinical examples and the potential implications of sequencing-guided treatments (e.g., targeted inhibitors like BRAF inhibitors in BRAF-mutant cancers).