Comprehensive Notes: Gene Sequencing, NGS, and Quantitative PCR (qPCR)

Overview and Context

  • Gene sequencing is part of a broader human effort to understand ourselves and disease, including cancer. It complements other big projects like the Apollo program (landing on the Moon) and the development of nuclear weapons; the key aim is to understand human biology to treat diseases such as cancer.
  • The idea: sequence every gene, map the genome, understand structure, function, evolution, and editing, using high-throughput sequencing and bioinformatics to assemble the human genome.
  • Historical milestones:
    • Human Genome Project (HGP): started in the 1990s, completed in about 13 years; NIH led the effort; sequenced reference human genome and created a shared database for researchers.
    • 1000 Genomes Project: later expansion to a much larger population to capture genetic diversity and rare variants.
    • Interpretation: on average, a person carries roughly
      N2.5× 102 to 3times102N \sim 2.5 \, \times \ 10^2 \text{ to } 3 \\times 10^2
      mutations relative to the reference genome, most benign but some contribute to disease.
  • Core genetic concepts (brief revision):
    • DNA stores genetic information; nucleotides: adenine (A), cytosine (C), guanine (G), thymine (T).
    • RNA is usually single-stranded and uses uracil (U) instead of thymine; base set: A, C, G, U.
    • Genes are nucleic acid sequences located on chromosomes within the nucleus.
    • mRNA (messenger RNA) is a single-stranded copy of a gene; introns are non-coding regions in genomic DNA; exons encode proteins and are retained in mature mRNA after splicing.
    • RNA processing: transcription from DNA produces pre-mRNA; introns are spliced out to produce mature mRNA, which exits the nucleus and is translated into protein.
    • Non-coding RNAs: microRNAs (miRNAs) are small RNAs that regulate translation; long non-coding RNAs (lncRNAs) regulate gene expression and have functions beyond “junk” RNA.
    • Tissue-specific transcripts (SIIs/PIIs) reflect tissue-specific expression patterns.
  • Gene sequencing vs expression profiling: sequencing maps genome structure and potential variants; it is complemented by expression profiling to understand which genes are active under different conditions.
  • Practical framing: sequencing uses high-throughput approaches plus bioinformatics to reconstruct genomes; PCR is a foundational tool used before sequencing to amplify DNA fragments for analysis.

DNA and RNA: Quick Revision

  • DNA basics
    • DNA carries genetic information.
    • Building blocks: nucleotides consist of a phosphate group, a sugar (deoxyribose), and a nitrogenous base.
    • Four DNA bases: A,  C,  G,  TA,\; C,\; G,\; T.
  • RNA basics
    • RNA uses A,  C,  G,  UA,\; C,\; G,\; U (instead of T).
    • Messenger RNA (mRNA) is a single-stranded copy of a gene; introns are removed during processing; exons are retained and code for amino acids.
  • Central dogma quick recap
    • Transcription: DNA -> RNA (pre-mRNA containing introns and exons).
    • RNA processing: introns removed; mature mRNA exported to cytoplasm.
    • Translation: mRNA -> protein (codons code for amino acids).
  • Non-coding RNAs
    • microRNA (miRNA): small regulatory RNAs that control protein expression at the translational level.
    • long non-coding RNA (lncRNA): regulatory roles in gene expression.
  • Genomic organization
    • DNA contains exons and introns; genes are arranged along chromosomes.
    • Genes exist in multiple copies and can be differentially expressed across tissues.

What is Gene Sequencing?

  • Definition: sequencing aims to understand the genomic mapping of the human body, integrating structure, function, evolution, and editing related to the genome.
  • Core approach: combine high-throughput sequencing techniques with bioinformatics to reconstruct the genome sequence and identify variants.
  • Sequencing history notes:
    • From Sanger (first sequencing method) to Next-Generation Sequencing (NGS).
    • Sanger/sequencing by synthesis of short DNA fragments, reading a single sequence at a time.
    • NGS allows parallel sequencing of many fragments, dramatically increasing throughput.
  • Role of PCR in sequencing
    • Polymerase Chain Reaction (PCR) amplifies DNA fragments to generate enough material for sequencing or analysis.
    • Basic PCR cycle stages: denaturation, annealing, elongation.
    • Denaturation: double-stranded DNA is heated to separate strands; typically around 95C95^{\circ}C; melting temperature depends on GC content.
    • Annealing: primers bind to complementary sequences; temperature near primer melting temperature (Tm); aim for similar Tm between forward and reverse primers.
    • Extension: DNA polymerase extends primers at about 72C72^{\circ}C, using dNTPs to synthesize new strands.
    • Each cycle doubles the amount of target DNA: after n cycles, copies ≈ N=N02nN = N_0 \cdot 2^n.
  • Historical genome projects relevance
    • HGP enabled reference genome for comparison with patient samples.
    • 1000 Genomes expanded diversity to discover population-specific and rare variants.
  • PCR applications beyond sequencing
    • Gene expression profiling, genetic fingerprinting, pathogen detection (e.g., SARS-CoV-2 testing via RT-qPCR).
  • Modern sequencing landscape
    • Over thousands of genomes sequenced and available in public databases (e.g., NCBI).
    • Next-generation sequencing produces detailed readouts of DNA sequences from samples.

Next-Generation Sequencing (NGS) vs Microarrays

  • Key idea: NGS detects sequences directly; microarrays rely on preset probes to detect known transcripts.
  • NGS advantages over microarrays
    • Can detect novel transcripts and low-abundance sequences ignored by microarrays.
    • Higher sensitivity and accuracy; broader dynamic range.
    • No dependence on predefined probes; can map entire transcriptome and genome sequences.
  • How NGS works (simplified)
    • Fragment genomic DNA into many pieces; add adapters and barcodes to ends of fragments.
    • Attach fragments to sequencing platform, perform amplification (cluster generation), and read fluorescence signals to determine base identities.
    • Computationally assemble reads into a complete sequence and align to reference genome.
  • Practical implication in cancer research
    • NGS helps identify mutations, rearrangements, copy number variations, and transcriptome changes that accompany cancer progression.

Single-Cell Sequencing in Cancer

  • Rationale
    • Tumors are heterogeneous: cancer cells, stromal cells, immune cells, and other cell types coexist in a mass.
    • Bulk sequencing averages signals across all cells, masking cell-type-specific changes.
  • What single-cell sequencing adds
    • Sequencing the RNA from individual cells reveals cell-type-specific gene expression profiles.
    • Enables identification of cancer cells, immune cells (e.g., T cells, macrophages), fibroblasts, and other components.
  • Cancer development and staging in the single-cell view
    • Normal tissue baseline provides a reference.
    • Early precancerous cells show mutations that begin malignant transformation.
    • Carcinoma in situ remains at the original site; progression leads to invasion and possible metastasis.
    • Metastasis involves cancer cells entering blood/lymph systems and colonizing distant organs (e.g., liver, lung, bone, brain).
  • Example workflow and insights
    • Compare single-cell profiles across stages to identify driver mutations and pathways involved in progression.
    • In melanoma, about 50% of cases carry an activating BRAF mutation in a survival pathway; BRAF inhibitors showed high response rates (~80%) in BRAF-mutant melanoma patients.
    • Single-cell sequencing of cancer nerve interactions reveals how nerves invade tumors and correlate with cellular stress markers (e.g., ER stress, GRP78).
  • A case study sketch from the lecturer’s lab
    • Research topic: cancer–nerve interactions in solid tumors (notably pancreatic cancer).
    • Observed nerve invasion correlates with GRP78 (an ER stress marker).
    • Experiments compared cancer cells grown under normal conditions versus ER-stress conditions, and the effect on neurite outgrowth/nerve invasion was monitored.
    • Follow-up single-cell sequencing generated gene lists with significant differential expression under ER stress conditions; shortlisted genes guided further validation.
    • Validation via real-time RT-qPCR in multiple cancer cell lines confirmed upregulation of selected targets.
  • Data interpretation and validation workflow
    • Post-sequencing pipeline: identify significantly changed genes (fold-change, statistical significance).
    • Use public resources (e.g., NCBI Gene, GeneCards) to annotate gene function and disease relevance (cancer, nerve interaction, ER stress).
    • Shortlist candidate genes (e.g., >2-fold change and significant) for further validation.
    • Validation step is essential to confirm sequencing results before drawing conclusions.
  • Practical time frame (illustrative)
    • From sequencing order to data results: even with outsourcing, analysis and validation can take several weeks to months; a typical total process can span about nine months.

A Clinical and Practical Example: Nerve-Cancer Interactions Case

  • Context
    • Pancreatic and other solid tumors exhibit nerve invasion; nerve fibers infiltrating tumors contribute to pain and tumor biology.
  • Experimental setup
    • In vitro: co-cultures of cancer cells with nerve cells under normal vs ER stress conditions.
    • Observed neuronal morphology changes under ER stress: neurites elongate and form networks, indicating nerve activity and potential invasion cues.
  • In vivo validation
    • Tumor-bearing mice show nerve innervation consistent with human samples, supporting a role for nerves in tumor progression.
  • Sequencing and downstream analysis
    • After ER-stress–induced cultures, client-company sequencing yielded tens of thousands of genes; filtering focused on significant fold-changes.
    • Heatmaps (clustered gene expression) helped visualize upregulated vs downregulated genes across conditions.
    • Subsequent annotation and literature review identified candidate genes connected to cancer, nerve invasion, and ER stress.
    • Selected genes validated by RT-qPCR across multiple cancer cell lines, strengthening confidence in sequencing findings.
  • Practical takeaway
    • NGS-based discovery must be followed by rigorous validation (e.g., qPCR) and functional studies to confirm roles in cancer biology.

Quantitative PCR (qPCR) vs Traditional PCR

  • Core idea of PCR
    • PCR amplifies DNA fragments to enable analysis; traditional (endpoint) PCR is often analyzed by gel electrophoresis after amplification.
  • qPCR (quantitative PCR) vs traditional PCR
    • qPCR monitors amplification in real time using fluorescence; products are quantified as the reaction progresses.
    • Traditional PCR results are end-point only (gel band intensity is semi-quantitative at best).
  • Why qPCR is preferred for quantitation
    • Higher sensitivity and specificity; real-time readout; lower starting quantities suffice; avoids post-PCR processing.
  • Two main ways to measure fluorescence in qPCR
    • Probe-based (e.g., TaqMan): uses a sequence-specific probe with a reporter dye and a quencher; fluorescence increases as the probe is degraded during amplification.
    • Intercalating dye-based (e.g., SYBR Green): dye binds double-stranded DNA; fluorescence increases with the amount of dsDNA formed.
  • RT-qPCR: combining reverse transcription with qPCR
    • Template can be RNA (usually mRNA) or DNA; when starting from RNA, reverse transcription creates complementary DNA (cDNA).
    • RT-qPCR can be done as one-step (one reaction for RT and PCR) or two-step (RT to cDNA, then qPCR on multiple targets).
  • Primer design and assay setup for qPCR
    • Primer design aims for short amplicons, typically 100–150 bp, to optimize efficiency and accuracy.
    • Primer characteristics:
    • Amplicon size around 100–150 bp.
    • Similar melting temperatures for forward and reverse primers (ideally within 2°C).
    • Avoid primer dimers and secondary structures; avoid amplifying genomic DNA introns when targeting cDNA.
    • Template sources:
    • DNA template (genomic DNA, plasmid DNA) or RNA template with reverse transcription to cDNA.
    • RT steps and enzyme choices:
    • Reverse transcription uses a reverse transcriptase and RNase inhibitors to convert RNA to cDNA; RT step is typically around 30 minutes; cDNA is more stable than RNA and can be stored.
  • One-step vs two-step RT-qPCR: pros and cons
    • One-step RT-qPCR: faster, fewer pipetting steps; less hands-on time; convenient for high-throughput assays; less flexibility for multiple targets from the same RNA input.
    • Two-step RT-qPCR: greater flexibility, allows using the same cDNA for multiple qPCR assays; often more suitable for gene expression profiling with multiple targets; cDNA can be archived for later use.
  • Data interpretation and quantitation methods
    • Ct (cycle threshold) value: the cycle number at which fluorescence crosses a predefined threshold; inversely related to the amount of target nucleic acid.
    • Absolute quantification (standard curve method): uses a DNA standard with known copy numbers to generate a standard curve; unknown sample copies are read from the curve.
    • Relative quantification (comparative Ct or ΔΔCt method): compares target gene expression in treated vs control samples using a reference (housekeeping) gene; fold change is calculated as:
      Fold change=2ΔΔC<em>T\text{Fold change} = 2^{-\Delta\Delta C<em>T} where ΔC</em>T=C<em>T(target)C</em>T(reference)\Delta C</em>T = C<em>T(\text{target}) - C</em>T(\text{reference})
      and
      ΔΔC<em>T=ΔC</em>TtreatedΔCTcontrol\Delta\Delta C<em>T = \Delta C</em>T^{\text{treated}} - \Delta C_T^{\text{control}}
  • Role of housekeeping genes and controls
    • Use stable reference genes (e.g., GAPDH, ACTB) whose expression does not change across treatments.
    • If the CT of the housekeeping gene changes significantly, re-evaluate the reference gene choice.
  • Practical considerations and common pitfalls
    • Endpoint PCR gels can be qualitative; qPCR provides kinetic data and more robust quantification.
    • Poor primer design or degraded templates can yield misleading results; always include controls and replicates.
    • Validation is crucial: corroborate sequencing findings with independent methods (e.g., RT-qPCR in multiple cell lines).
  • One example exam-style notes discussed in class
    • Denaturation, annealing, and extension are the three PCR steps; extension/elongation corresponds to the third step where nucleotides are added.
    • The common exam question: identify which step involves the addition of nucleotides to extend the growing DNA strand; answer: extension (elongation).

Primer Design and Practical Guidelines

  • For qPCR primers, design constraints are tighter than regular PCR primers.
    • Product size: 100–150 bp.
    • Forward and reverse primers should have similar melting temperatures, with a maximum difference of about 2°C.
    • Avoid amplification of genomic DNA by targeting exon–exon junctions when using cDNA as template.
    • Avoid primer dimers, repetitive sequences, and secondary structures.
  • Primer design workflow (typical in labs)
    • Use online tools (e.g., NCBI Primer-BLAST) to select primer sets with appropriate product size, Tm, GC content, and specificity.
    • Examine predicted amplicon location, GC content, and potential off-target amplification.
    • Validate primer performance empirically with qPCR and melt-curve analysis if using SYBR Green.
  • RT-qPCR data capture and interpretation
    • Real-time fluorescence curves indicate amplification progress; a successful reaction shows a rising curve and a defined exponential phase.
    • A flat curve indicates a failed reaction (no target amplification).

Practical Takeaways and Real-World Relevance

  • When to use what
    • Use NGS for comprehensive genome/transcriptome profiling, discovery of novel transcripts, and complex tumor heterogeneity.
    • Use microarrays when a fixed, predefined set of transcripts is sufficient (less expensive for some applications), but be aware of lower sensitivity and limited dynamic range compared to NGS.
    • Use qPCR for targeted, sensitive, and quantitative validation of specific genes after sequencing or as a diagnostic/monitoring tool (e.g., viral load, gene expression changes).
  • Validation and workflow reality
    • Sequencing projects often require weeks to months for data processing, filtering, and biological interpretation.
    • Validation with RT-qPCR (or other orthogonal methods) is essential to confirm sequencing results before publication or clinical translation.
  • Ethical and practical implications
    • Genomic data interpretation must consider potential incidental findings and patient privacy.
    • Clinical translation requires rigorous validation, reproducibility, and standardization across laboratories.
  • Summary of key equations and concepts
    • PCR amplification (ideal): N=N02nN = N_0 \cdot 2^n
    • Absolute quantification (standard curve): Ct=alog10(N)+bN=10(Ctb)/aCt = a \log_{10}(N) + b \quad\Rightarrow\quad N = 10^{(Ct-b)/a}
    • qPCR relative quantification: ΔC<em>T=C</em>T(target)C<em>T(reference)\Delta C<em>T = C</em>T(\text{target}) - C<em>T(\text{reference})ΔΔC</em>T=ΔC<em>TtreatedΔC</em>Tcontrol\Delta\Delta C</em>T = \Delta C<em>T^{\text{treated}} - \Delta C</em>T^{\text{control}}
      Fold change=2ΔΔCT\text{Fold change} = 2^{-\Delta\Delta C_T}
  • Key terminology recap
    • RT-qPCR: reverse transcription followed by quantitative PCR.
    • One-step RT-qPCR: RT and qPCR in the same reaction.
    • Two-step RT-qPCR: first RT to cDNA, then separate qPCR for multiple targets.
    • Primer design and product size constraints for qPCR: 100–150 bp, GC balance, minimal primer-dimers.
    • NGS vs microarray: NGS detects novel transcripts and low-abundance sequences with higher sensitivity; microarrays rely on predefined probes.
    • Single-cell sequencing: enables resolution of cellular heterogeneity in tumors and the study of cancer progression at the cell level.

Final Takeaways for Exam Preparation

  • Understand the differences between sequencing technologies and when each is appropriate (NGS vs microarray).
  • Be able to describe the steps of PCR, qPCR, and RT-qPCR, and explain why qPCR provides quantitative data.
  • Know the two quantitative approaches in qPCR (absolute with a standard curve; relative with ΔΔCt) and how to compute fold changes.
  • Recognize the role of housekeeping genes in normalization and the importance of validating reference genes.
  • Appreciate the workflow from discovery (NGS) to validation (qPCR) in cancer research, including how single-cell sequencing informs tumor heterogeneity and targeted therapies.
  • Be familiar with clinical examples and the potential implications of sequencing-guided treatments (e.g., targeted inhibitors like BRAF inhibitors in BRAF-mutant cancers).