Comprehensive Notes: Gene Sequencing, NGS, and Quantitative PCR (qPCR)

Overview and Context

Gene sequencing is part of a broader human effort to understand ourselves and disease, including cancer. It complements other big projects like the Apollo program (landing on the Moon) and the development of nuclear weapons; the key aim is to understand human biology to treat diseases such as cancer.
The idea: sequence every gene, map the genome, understand structure, function, evolution, and editing, using high-throughput sequencing and bioinformatics to assemble the human genome.
Historical milestones:
- Human Genome Project (HGP): started in the 1990s, completed in about 13 years; NIH led the effort; sequenced reference human genome and created a shared database for researchers.
- 1000 Genomes Project: later expansion to a much larger population to capture genetic diversity and rare variants.
- Interpretation: on average, a person carries roughly
  $N \sim 2.5 \, \times \ 10^2 \text{ to } 3 \\times 10^2$
  mutations relative to the reference genome, most benign but some contribute to disease.
Core genetic concepts (brief revision):
- DNA stores genetic information; nucleotides: adenine (A), cytosine (C), guanine (G), thymine (T).
- RNA is usually single-stranded and uses uracil (U) instead of thymine; base set: A, C, G, U.
- Genes are nucleic acid sequences located on chromosomes within the nucleus.
- mRNA (messenger RNA) is a single-stranded copy of a gene; introns are non-coding regions in genomic DNA; exons encode proteins and are retained in mature mRNA after splicing.
- RNA processing: transcription from DNA produces pre-mRNA; introns are spliced out to produce mature mRNA, which exits the nucleus and is translated into protein.
- Non-coding RNAs: microRNAs (miRNAs) are small RNAs that regulate translation; long non-coding RNAs (lncRNAs) regulate gene expression and have functions beyond “junk” RNA.
- Tissue-specific transcripts (SIIs/PIIs) reflect tissue-specific expression patterns.
Gene sequencing vs expression profiling: sequencing maps genome structure and potential variants; it is complemented by expression profiling to understand which genes are active under different conditions.
Practical framing: sequencing uses high-throughput approaches plus bioinformatics to reconstruct genomes; PCR is a foundational tool used before sequencing to amplify DNA fragments for analysis.

DNA and RNA: Quick Revision

DNA basics
- DNA carries genetic information.
- Building blocks: nucleotides consist of a phosphate group, a sugar (deoxyribose), and a nitrogenous base.
- Four DNA bases: $A,\; C,\; G,\; T$ .
RNA basics
- RNA uses $A,\; C,\; G,\; U$ (instead of T).
- Messenger RNA (mRNA) is a single-stranded copy of a gene; introns are removed during processing; exons are retained and code for amino acids.
Central dogma quick recap
- Transcription: DNA -> RNA (pre-mRNA containing introns and exons).
- RNA processing: introns removed; mature mRNA exported to cytoplasm.
- Translation: mRNA -> protein (codons code for amino acids).
Non-coding RNAs
- microRNA (miRNA): small regulatory RNAs that control protein expression at the translational level.
- long non-coding RNA (lncRNA): regulatory roles in gene expression.
Genomic organization
- DNA contains exons and introns; genes are arranged along chromosomes.
- Genes exist in multiple copies and can be differentially expressed across tissues.

What is Gene Sequencing?

Definition: sequencing aims to understand the genomic mapping of the human body, integrating structure, function, evolution, and editing related to the genome.
Core approach: combine high-throughput sequencing techniques with bioinformatics to reconstruct the genome sequence and identify variants.
Sequencing history notes:
- From Sanger (first sequencing method) to Next-Generation Sequencing (NGS).
- Sanger/sequencing by synthesis of short DNA fragments, reading a single sequence at a time.
- NGS allows parallel sequencing of many fragments, dramatically increasing throughput.
Role of PCR in sequencing
- Polymerase Chain Reaction (PCR) amplifies DNA fragments to generate enough material for sequencing or analysis.
- Basic PCR cycle stages: denaturation, annealing, elongation.
- Denaturation: double-stranded DNA is heated to separate strands; typically around $95^{\circ}C$ ; melting temperature depends on GC content.
- Annealing: primers bind to complementary sequences; temperature near primer melting temperature (Tm); aim for similar Tm between forward and reverse primers.
- Extension: DNA polymerase extends primers at about $72^{\circ}C$ , using dNTPs to synthesize new strands.
- Each cycle doubles the amount of target DNA: after n cycles, copies ≈ $N = N_0 \cdot 2^n$ .
Historical genome projects relevance
- HGP enabled reference genome for comparison with patient samples.
- 1000 Genomes expanded diversity to discover population-specific and rare variants.
PCR applications beyond sequencing
- Gene expression profiling, genetic fingerprinting, pathogen detection (e.g., SARS-CoV-2 testing via RT-qPCR).
Modern sequencing landscape
- Over thousands of genomes sequenced and available in public databases (e.g., NCBI).
- Next-generation sequencing produces detailed readouts of DNA sequences from samples.

Next-Generation Sequencing (NGS) vs Microarrays

Key idea: NGS detects sequences directly; microarrays rely on preset probes to detect known transcripts.
NGS advantages over microarrays
- Can detect novel transcripts and low-abundance sequences ignored by microarrays.
- Higher sensitivity and accuracy; broader dynamic range.
- No dependence on predefined probes; can map entire transcriptome and genome sequences.
How NGS works (simplified)
- Fragment genomic DNA into many pieces; add adapters and barcodes to ends of fragments.
- Attach fragments to sequencing platform, perform amplification (cluster generation), and read fluorescence signals to determine base identities.
- Computationally assemble reads into a complete sequence and align to reference genome.
Practical implication in cancer research
- NGS helps identify mutations, rearrangements, copy number variations, and transcriptome changes that accompany cancer progression.

Single-Cell Sequencing in Cancer

Rationale
- Tumors are heterogeneous: cancer cells, stromal cells, immune cells, and other cell types coexist in a mass.
- Bulk sequencing averages signals across all cells, masking cell-type-specific changes.
What single-cell sequencing adds
- Sequencing the RNA from individual cells reveals cell-type-specific gene expression profiles.
- Enables identification of cancer cells, immune cells (e.g., T cells, macrophages), fibroblasts, and other components.
Cancer development and staging in the single-cell view
- Normal tissue baseline provides a reference.
- Early precancerous cells show mutations that begin malignant transformation.
- Carcinoma in situ remains at the original site; progression leads to invasion and possible metastasis.
- Metastasis involves cancer cells entering blood/lymph systems and colonizing distant organs (e.g., liver, lung, bone, brain).
Example workflow and insights
- Compare single-cell profiles across stages to identify driver mutations and pathways involved in progression.
- In melanoma, about 50% of cases carry an activating BRAF mutation in a survival pathway; BRAF inhibitors showed high response rates (~80%) in BRAF-mutant melanoma patients.
- Single-cell sequencing of cancer nerve interactions reveals how nerves invade tumors and correlate with cellular stress markers (e.g., ER stress, GRP78).
A case study sketch from the lecturer’s lab
- Research topic: cancer–nerve interactions in solid tumors (notably pancreatic cancer).
- Observed nerve invasion correlates with GRP78 (an ER stress marker).
- Experiments compared cancer cells grown under normal conditions versus ER-stress conditions, and the effect on neurite outgrowth/nerve invasion was monitored.
- Follow-up single-cell sequencing generated gene lists with significant differential expression under ER stress conditions; shortlisted genes guided further validation.
- Validation via real-time RT-qPCR in multiple cancer cell lines confirmed upregulation of selected targets.
Data interpretation and validation workflow
- Post-sequencing pipeline: identify significantly changed genes (fold-change, statistical significance).
- Use public resources (e.g., NCBI Gene, GeneCards) to annotate gene function and disease relevance (cancer, nerve interaction, ER stress).
- Shortlist candidate genes (e.g., >2-fold change and significant) for further validation.
- Validation step is essential to confirm sequencing results before drawing conclusions.
Practical time frame (illustrative)
- From sequencing order to data results: even with outsourcing, analysis and validation can take several weeks to months; a typical total process can span about nine months.

A Clinical and Practical Example: Nerve-Cancer Interactions Case

Context
- Pancreatic and other solid tumors exhibit nerve invasion; nerve fibers infiltrating tumors contribute to pain and tumor biology.
Experimental setup
- In vitro: co-cultures of cancer cells with nerve cells under normal vs ER stress conditions.
- Observed neuronal morphology changes under ER stress: neurites elongate and form networks, indicating nerve activity and potential invasion cues.
In vivo validation
- Tumor-bearing mice show nerve innervation consistent with human samples, supporting a role for nerves in tumor progression.
Sequencing and downstream analysis
- After ER-stress–induced cultures, client-company sequencing yielded tens of thousands of genes; filtering focused on significant fold-changes.
- Heatmaps (clustered gene expression) helped visualize upregulated vs downregulated genes across conditions.
- Subsequent annotation and literature review identified candidate genes connected to cancer, nerve invasion, and ER stress.
- Selected genes validated by RT-qPCR across multiple cancer cell lines, strengthening confidence in sequencing findings.
Practical takeaway
- NGS-based discovery must be followed by rigorous validation (e.g., qPCR) and functional studies to confirm roles in cancer biology.

Quantitative PCR (qPCR) vs Traditional PCR

Core idea of PCR
- PCR amplifies DNA fragments to enable analysis; traditional (endpoint) PCR is often analyzed by gel electrophoresis after amplification.
qPCR (quantitative PCR) vs traditional PCR
- qPCR monitors amplification in real time using fluorescence; products are quantified as the reaction progresses.
- Traditional PCR results are end-point only (gel band intensity is semi-quantitative at best).
Why qPCR is preferred for quantitation
- Higher sensitivity and specificity; real-time readout; lower starting quantities suffice; avoids post-PCR processing.
Two main ways to measure fluorescence in qPCR
- Probe-based (e.g., TaqMan): uses a sequence-specific probe with a reporter dye and a quencher; fluorescence increases as the probe is degraded during amplification.
- Intercalating dye-based (e.g., SYBR Green): dye binds double-stranded DNA; fluorescence increases with the amount of dsDNA formed.
RT-qPCR: combining reverse transcription with qPCR
- Template can be RNA (usually mRNA) or DNA; when starting from RNA, reverse transcription creates complementary DNA (cDNA).
- RT-qPCR can be done as one-step (one reaction for RT and PCR) or two-step (RT to cDNA, then qPCR on multiple targets).
Primer design and assay setup for qPCR
- Primer design aims for short amplicons, typically 100–150 bp, to optimize efficiency and accuracy.
- Primer characteristics:
- Amplicon size around 100–150 bp.
- Similar melting temperatures for forward and reverse primers (ideally within 2°C).
- Avoid primer dimers and secondary structures; avoid amplifying genomic DNA introns when targeting cDNA.
- Template sources:
- DNA template (genomic DNA, plasmid DNA) or RNA template with reverse transcription to cDNA.
- RT steps and enzyme choices:
- Reverse transcription uses a reverse transcriptase and RNase inhibitors to convert RNA to cDNA; RT step is typically around 30 minutes; cDNA is more stable than RNA and can be stored.
One-step vs two-step RT-qPCR: pros and cons
- One-step RT-qPCR: faster, fewer pipetting steps; less hands-on time; convenient for high-throughput assays; less flexibility for multiple targets from the same RNA input.
- Two-step RT-qPCR: greater flexibility, allows using the same cDNA for multiple qPCR assays; often more suitable for gene expression profiling with multiple targets; cDNA can be archived for later use.
Data interpretation and quantitation methods
- Ct (cycle threshold) value: the cycle number at which fluorescence crosses a predefined threshold; inversely related to the amount of target nucleic acid.
- Absolute quantification (standard curve method): uses a DNA standard with known copy numbers to generate a standard curve; unknown sample copies are read from the curve.
- Relative quantification (comparative Ct or ΔΔCt method): compares target gene expression in treated vs control samples using a reference (housekeeping) gene; fold change is calculated as:
 $\text{Fold change} = 2^{-\Delta\Delta CT}$ where $\Delta CT = CT(\text{target}) - CT(\text{reference})$
 and
 $\Delta\Delta CT = \Delta CT^{\text{treated}} - \Delta C_T^{\text{control}}$
Role of housekeeping genes and controls
- Use stable reference genes (e.g., GAPDH, ACTB) whose expression does not change across treatments.
- If the CT of the housekeeping gene changes significantly, re-evaluate the reference gene choice.
Practical considerations and common pitfalls
- Endpoint PCR gels can be qualitative; qPCR provides kinetic data and more robust quantification.
- Poor primer design or degraded templates can yield misleading results; always include controls and replicates.
- Validation is crucial: corroborate sequencing findings with independent methods (e.g., RT-qPCR in multiple cell lines).
One example exam-style notes discussed in class
- Denaturation, annealing, and extension are the three PCR steps; extension/elongation corresponds to the third step where nucleotides are added.
- The common exam question: identify which step involves the addition of nucleotides to extend the growing DNA strand; answer: extension (elongation).

Primer Design and Practical Guidelines

For qPCR primers, design constraints are tighter than regular PCR primers.
- Product size: 100–150 bp.
- Forward and reverse primers should have similar melting temperatures, with a maximum difference of about 2°C.
- Avoid amplification of genomic DNA by targeting exon–exon junctions when using cDNA as template.
- Avoid primer dimers, repetitive sequences, and secondary structures.
Primer design workflow (typical in labs)
- Use online tools (e.g., NCBI Primer-BLAST) to select primer sets with appropriate product size, Tm, GC content, and specificity.
- Examine predicted amplicon location, GC content, and potential off-target amplification.
- Validate primer performance empirically with qPCR and melt-curve analysis if using SYBR Green.
RT-qPCR data capture and interpretation
- Real-time fluorescence curves indicate amplification progress; a successful reaction shows a rising curve and a defined exponential phase.
- A flat curve indicates a failed reaction (no target amplification).

Practical Takeaways and Real-World Relevance

When to use what
- Use NGS for comprehensive genome/transcriptome profiling, discovery of novel transcripts, and complex tumor heterogeneity.
- Use microarrays when a fixed, predefined set of transcripts is sufficient (less expensive for some applications), but be aware of lower sensitivity and limited dynamic range compared to NGS.
- Use qPCR for targeted, sensitive, and quantitative validation of specific genes after sequencing or as a diagnostic/monitoring tool (e.g., viral load, gene expression changes).
Validation and workflow reality
- Sequencing projects often require weeks to months for data processing, filtering, and biological interpretation.
- Validation with RT-qPCR (or other orthogonal methods) is essential to confirm sequencing results before publication or clinical translation.
Ethical and practical implications
- Genomic data interpretation must consider potential incidental findings and patient privacy.
- Clinical translation requires rigorous validation, reproducibility, and standardization across laboratories.
Summary of key equations and concepts
- PCR amplification (ideal): $N = N_0 \cdot 2^n$
- Absolute quantification (standard curve): $Ct = a \log_{10}(N) + b \quad\Rightarrow\quad N = 10^{(Ct-b)/a}$
- qPCR relative quantification: $\Delta CT = CT(\text{target}) - CT(\text{reference})$ $\Delta\Delta CT = \Delta CT^{\text{treated}} - \Delta CT^{\text{control}}$
 $\text{Fold change} = 2^{-\Delta\Delta C_T}$
Key terminology recap
- RT-qPCR: reverse transcription followed by quantitative PCR.
- One-step RT-qPCR: RT and qPCR in the same reaction.
- Two-step RT-qPCR: first RT to cDNA, then separate qPCR for multiple targets.
- Primer design and product size constraints for qPCR: 100–150 bp, GC balance, minimal primer-dimers.
- NGS vs microarray: NGS detects novel transcripts and low-abundance sequences with higher sensitivity; microarrays rely on predefined probes.
- Single-cell sequencing: enables resolution of cellular heterogeneity in tumors and the study of cancer progression at the cell level.

Final Takeaways for Exam Preparation

Understand the differences between sequencing technologies and when each is appropriate (NGS vs microarray).
Be able to describe the steps of PCR, qPCR, and RT-qPCR, and explain why qPCR provides quantitative data.
Know the two quantitative approaches in qPCR (absolute with a standard curve; relative with ΔΔCt) and how to compute fold changes.
Recognize the role of housekeeping genes in normalization and the importance of validating reference genes.
Appreciate the workflow from discovery (NGS) to validation (qPCR) in cancer research, including how single-cell sequencing informs tumor heterogeneity and targeted therapies.
Be familiar with clinical examples and the potential implications of sequencing-guided treatments (e.g., targeted inhibitors like BRAF inhibitors in BRAF-mutant cancers).