Bioinformatics:
Interdisciplinary science: biology, computer science, and mathematics.
Analyzes and interprets biological data, especially sequence data (DNA, RNA, proteins).
Extracts insights from large-scale datasets from sequencing technologies.
Biological Sequence Databases:
INSDC (International Nucleotide Sequence Database Collaboration):
DDBJ (DNA Data Bank of Japan).
GenBank (NCBI, USA).
EMBL-EBI (Europe).
Synchronized daily and freely accessible.
Major Sequence Browsers:
NCBI Genome Database: Broad access to DNA/protein sequences.
Ensembl Genome Browser: Gene annotations, comparative genomics tools.
UCSC Genome Browser: Track-based visualization and analysis.
Sequence Formats:
FASTA: >
followed by the sequence.
GenBank: Includes annotations; "LOCUS," "ORIGIN," and "//" syntax.
ENSEMBL: Uses "ID," "SQ," and annotation lines.
Navigating NCBI, ENSEMBL, and UCSC Genome Browser:
NCBI: BLAST, PubMed, GenBank, protein/gene databases.
ENSEMBL: Searchable by gene symbol, chromosome coordinates, phenotypic data.
UCSC: Browser tracks with genomic annotations, comparative genomics, sequence searches.
Homologues, Orthologues, Paralogues:
Homologues: Genes sharing common ancestry.
Orthologues: Different species, similar function.
Paralogues: Duplicated within the same genome, may evolve new functions.
Examples:
Human globin gene cluster: multiple paralogues (HBB, HBD).
α- and β-globin genes (humans, frogs): orthologous relationships.
Sequence Alignments:
Compare sequences to identify similarities/differences.
Mismatches: different nucleotides/amino acids.
Gaps: insertions/deletions.
Protein alignments: consider chemical similarities.
Conservative substitutions: similar side-chain properties (e.g., Leu → Ile).
Radical substitutions: different properties (e.g., Gly → Glu).
BLAST (Basic Local Alignment Search Tool):
Compares a query sequence to a database.
Types: blastn, blastp, blastx, etc.
Helps determine:
Organism origin of a sequence.
Gene identity and mutations.
Potential disease associations.
Polymerase Chain Reaction (PCR):
Standard PCR amplifies a specific DNA fragment.
Denaturation: Heating to separate strands.
Annealing: Primers bind to target DNA.
Extension: DNA polymerase synthesizes new strands.
Advanced PCR techniques:
RT-PCR (Reverse Transcription PCR): RNA → cDNA before amplification.
qPCR (Quantitative PCR): Real-time measurement using fluorescent dyes/probes; quantifies gene expression.
Multiplex PCR: Multiple primer sets to amplify different targets.
Nested PCR: Two rounds of PCR to improve specificity.
Touchdown PCR: Gradually lowering annealing temperature to improve specificity.
Primer Design:
Length: 18–25 nucleotides; longer = increased specificity.
Melting temperature (Tm): Similar for both primers (55–65°C).
GC content: 40–60% is optimal.
Avoid:
Hairpin loops.
Primer dimers.
Complementarity at 3’ ends (mis-priming).
Reagents Impact on PCR:
dNTPs: Building blocks of DNA; optimal concentration is crucial.
Mg^{2+} ions: Cofactor for DNA polymerase; too little = low yield, too much = non-specific products.
DNA polymerase: Taq is standard, Pfu offers higher accuracy.
Template quality: Impurities inhibit PCR.
Bioinformatics Tools for PCR Assay Development (BW2):
NCBI Primer-BLAST: Designs primers/checks specificity.
BLASTn: Ensures target region is unique.
OligoCalc: Calculates Tm, GC%, and primer self-complementarity.
Experimental Contexts for Different PCR Methods:
RT-qPCR:
Gene expression studies.
Diagnostic testing (e.g., viral load quantification).
Multiplex PCR:
Pathogen panels.
Forensic identification.
Touchdown PCR:
Difficult templates.
Low-abundance targets.
Gene Expression Measurement:
Gene expression: Genetic instructions used to synthesize RNA (and usually protein).
Allows comparison of gene activity across tissues, conditions, or time points.
Methods:
RT-qPCR:
RNA extraction → reverse transcription (mRNA → cDNA).
Real-time quantification: SYBR Green dye or TaqMan probes.
Results: Amplification curves; normalized to housekeeping genes (GAPDH, ACTB).
Outputs: Ct (threshold cycle) values; lower Ct = higher gene expression.
Microarrays:
Thousands of DNA probes are immobilized on a chip.
Fluorescently labeled cDNA is hybridized.
Relative fluorescence intensity reflects expression level.
Requires known gene sequences.
RNA-Seq (High-throughput sequencing):
Whole transcriptome coverage: non-coding RNAs and splice variants.
Raw reads undergo quality control (FastQC), alignment (HISAT2, STAR), quantification (FeatureCounts, HTSeq), and differential expression analysis (DESeq2, edgeR).
Outputs: Counts per gene, FPKM, or TPM values.
Biological Relevance of Gene Expression Data:
Comparing expression across conditions:
Identify biomarkers or therapeutic targets.
Reveal upregulated pathways in diseases (e.g., cancer).
Study regulatory mechanisms (transcription factors, epigenetic marks).
Example: BRCA1 downregulation in tumor tissue suggests compromised role in genomic stability.
Introduction to Practical Skills:
Learn sterile technique, pipette calibration, reagent handling.
Understand lab notebook structure.
Technical replicates (same sample, repeated measurement) vs. biological replicates (different samples, same treatment).
Bioinformatics Link: Using BLAST
BLASTn: Nucleotide query vs. nucleotide databases.
BLASTp: Protein vs. protein database.
blastx: Translates a nucleotide query into protein before comparison.
Use cases:
Identify unknown gene sequences.
Confirm cloning insert identity.
Compare homologous genes across species.
Gene Cloning:
Creating multiple, identical copies of a particular gene or DNA fragment.
Cornerstone of molecular biology and biotechnology.
Why clone genes?
To study gene function and expression.
To produce recombinant proteins (e.g., insulin, growth hormone).
To modify genomes (e.g., in gene therapy or genetic engineering).
Restriction Enzymes:
Recognize specific short DNA sequences (usually palindromes) and cut DNA at/near these sites.
Types of ends:
Sticky ends: Staggered cuts with overhangs; better for directional cloning. Example: EcoRI cuts between G and A in GAATTC.
Blunt ends: Straight cuts with no overhangs; less efficient but more flexible.
Multiple Cloning Site (MCS):
A short sequence in plasmids containing several restriction sites.
Offers versatility in inserting DNA fragments using different enzymes.
Cloning Vectors:
A DNA molecule used as a vehicle to transfer foreign genetic material into a host cell.
Common vector features:
Origin of replication (Ori): Allows replication inside the host.
Selectable marker: Antibiotic resistance gene (e.g., amp^R).
Reporter gene: E.g., lacZ for blue/white screening.
MCS region: For insertion of the target gene.
Plasmid vectors (e.g., pUC19, pGEM-T) are most commonly used in basic cloning.
Other specialized vectors:
Expression vectors: Enable protein production.
Shuttle vectors: Replicate in multiple species (e.g., E. coli and yeast).
Viral vectors: Used in gene therapy or transfection of animal cells.
Inserting DNA Fragments into Vectors:
Restriction digestion:
Vector and insert DNA are digested with the same or compatible enzymes to produce matching ends.
Ligation:
DNA ligase joins the insert and vector by forming phosphodiester bonds.
Requires ATP or NAD^+ as a cofactor.
T/A cloning (alternative method):
PCR products made with Taq polymerase often have single “A” overhangs.
Inserted into “T” overhang vectors without restriction digestion.
Transformation and Selection:
Transformation: Introducing recombinant DNA into bacteria (commonly E. coli).
Heat shock method: Cells briefly exposed to 42°C to encourage DNA uptake.
Electroporation: Electric field opens pores in bacterial membrane.
Selection strategies:
Plating on antibiotic-containing agar ensures only transformed cells survive.
Blue-white screening (lacZ system):
Insert disrupts lacZ → colonies turn white (successful clone).
No insert → functional lacZ → blue colonies with X-gal.
Controls in Gene Cloning Experiments:
Negative control: Plasmid-only (no insert) to assess background ligation.
Positive control: A known successful insert to validate the system.
No-enzyme control: Detects contamination or background resistance.
Ligation and Transformation Process:
Ligation:
Key enzyme: T4 DNA ligase catalyzes phosphodiester bonds.
Sticky-end ligation is more efficient than blunt-end.
Ligation reaction setup:
Vector-to-insert molar ratio: ~1:3.
Buffer with ATP is essential.
Incubated at 16°C overnight or room temp for 10–15 minutes (quick ligation).
Controls:
No-insert (vector only): Detects self-ligation background.
Insert-only: Should show no growth.
Uncut vector: Should yield only blue colonies (non-recombinant).
Transformation:
Uptake of recombinant DNA by competent E. coli cells.
Chemical transformation (heat shock):
Chill cells + plasmid DNA on ice.
Heat shock at 42°C for 30–60 seconds.
Cells recover in SOC or LB broth.
Electroporation:
Delivers electrical pulses (~1.8 kV).
Must use salt-free buffer.
Selection methods:
Antibiotic selection: E.g., ampicillin resistance (bla gene).
Blue-white screening:
Vector contains lacZα fragment; insertion disrupts it.
X-gal substrate turns blue in presence of active β-galactosidase.
White colonies = successful insert integration.
Blue colonies = no insert (intact lacZ).
Screening Colonies to Verify Correct Cloning:
Colony PCR:
Pick colonies, use a portion directly as PCR template.
Primers flank MCS region.
Restriction digest of miniprep DNA:
Digest plasmid from cultured colony using the same enzymes used during cloning.
Run on agarose gel.
Sequencing confirmation:
Sequence with vector-specific primers (e.g., T7, SP6, M13).
Essential to confirm: correct sequence, proper orientation, no PCR-induced mutations.
Troubleshooting Cloning Experiments:
No colonies:
Did the ligase buffer or competent cells expire?
Incorrect antibiotic concentration?
Low transformation efficiency?
Lots of blue colonies:
Likely vector self-ligation or uncut vector contamination.
Use dephosphorylation (alkaline phosphatase) to prevent re-ligation.
All white colonies, but no insert detected:
Primers may have amplified a different or truncated fragment.
Confirm primer specificity and gel-purify PCR product before cloning.
Human Genome Structure and Composition:
~3 billion base pairs across 23 pairs of chromosomes.
~20,000–25,000 protein-coding genes (~1.5% of the genome).
Remainder includes: introns, intergenic regions, regulatory sequences, repeats (satellite DNA, transposons, retroelements).
Functional non-coding DNA: regulates gene expression, includes non-coding RNAs (miRNA, lncRNA, etc.), and supports structural/chromosomal organization.
Impact of the Human Genome Project (HGP):
Timeline: 1990–2003.
Aims: Sequence the entire human genome, identify all human genes, improve technologies, and explore genetic variation.
Key Outcomes: Kickstarted precision medicine, databases (NCBI GenBank, Ensembl, UCSC), and revealed complexity of gene regulation.
Sanger Sequencing:
Based on selective incorporation of chain-terminating dideoxynucleotides (ddNTPs) during DNA synthesis.
Each ddNTP is fluorescently labeled.
Generates DNA fragments of varying lengths.
Fragments separated by capillary electrophoresis.
Reaction Components: Template DNA, primer, DNA polymerase, dNTPs + ddNTPs (labeled).
Advantages: High accuracy (~99.99%), excellent for small-scale projects.
Limitations: Low throughput, Max read length ~800–1000 bp, inefficient for large genomes.
Microarrays for Gene Expression Profiling:
A chip contains thousands of DNA probes attached in a grid.
Fluorescently labeled cDNA is hybridized to the chip.
Signal intensity corresponds to gene expression level.
Applications: Global gene expression analysis, comparing expression profiles, discovery of co-regulated genes.
Strengths: Simultaneous profiling of thousands of genes, established bioinformatics pipelines.
Weaknesses: Limited to known genes, cross-hybridization can reduce accuracy, replaced by RNA-Seq in many applications.
Linking Sequencing and Expression Analysis:
Sanger sequencing confirms gene identity and detects mutations.
Microarrays profile gene expression changes.
Next Generation Sequencing (NGS):
High-throughput sequencing technologies that allow millions to billions of DNA fragments to be sequenced simultaneously.
NGS Workflow Overview:
Library Preparation: Genomic DNA is fragmented, adapters are ligated, and library fragments may be barcoded.
Amplification: Bridge PCR (Illumina) or Emulsion PCR (Ion Torrent).
Sequencing by Synthesis: Each base is read as it's added to the growing strand using Fluorescence (Illumina), pH (Ion Torrent), or real-time detection (Nanopore/PacBio).
Image/Data Capture & Base Calling: Signals are converted into raw sequences (FASTQ files).
Bioinformatics Processing: Sequence alignment, variant calling, and downstream analysis.
Key NGS Platforms:
Illumina: High accuracy, short reads (100–300 bp), industry standard. Short reads may miss structural variants
Ion Torrent: Fast turnaround, semiconductor-based, 200-400 bp. Prone to indel errors (homopolymer runs)
PacBio (HiFi): Long reads (10–25 kb), ideal for structural insights. Lower throughput; expensive
Oxford Nanopore: Ultra-long reads (Up to 2 Mb+), portable, real-time. Higher error rates; improving rapidly
PacBio and Nanopore are used for de novo genome assembly, isoform resolution, and detecting large insertions/deletions.
Applications and Advantages of NGS over traditional methods:
Whole genome sequencing (WGS): Identify rare mutations and structural changes.
Whole exome sequencing (WES): Focus on protein-coding regions (~1% of genome).
RNA-Seq: Quantifies gene expression and splicing patterns.
ChIP-Seq: Maps transcription factor binding and epigenetic marks.
Targeted sequencing: Panels for cancer mutations, inherited disease genes.
Advantages over Sanger: higher throughput, cost-efficiency, and sensitivity.
Data Quality and Bioinformatics:
Phred quality scores (Q-scores): Indicate confidence in base calls (e.g., Q30 = 99.9% accuracy).
FASTQ files store both sequence and quality.
Adapter trimming and quality filtering are critical first steps.
Downstream tools: Alignment, variant calling, visualization, and functional annotation.
Emerging Trends in Sequencing Technology:
Single-cell sequencing: Profiles gene expression at individual cell level.
Spatial transcriptomics: Adds spatial context to gene expression.
Epigenetic sequencing: Methylation and chromatin accessibility.
Clinical NGS: Routine in cancer diagnostics, prenatal screening, and pharmacogenomics.
Eukaryotic Cell Cycle:
Interphase:
G₁ phase: Cell grows and prepares for DNA replication.
S phase: DNA is replicated.
G₂ phase: Checks for errors and prepares for mitosis.
M phase (Mitosis): Prophase, metaphase, anaphase, telophase, and cytokinesis; results in two identical daughter cells.
G₀ phase: Resting state; non-dividing cells may remain here permanently.
Molecular Regulators of the Cell Cycle:
Cyclins: Regulatory proteins whose levels fluctuate cyclically.
Cyclin-Dependent Kinases (CDKs): Enzymes activated by binding to cyclins; cyclin-CDK complexes drive the cell through each phase.
Example: Cyclin D/CDK4/6 in G₁; Cyclin E/CDK2 in G₁/S; Cyclin B/CDK1 in G₂/M.
CDK inhibitors (CKIs): Halt the cell cycle in response to DNA damage or stress. E.g., p21, p27, p16.
Checkpoints: G₁/S, G₂/M, and Spindle checkpoints.
DNA Replication and Telomere Maintenance:
DNA replication: Begins at origins of replication; Helicase unwinds the double helix; DNA polymerase synthesizes new strands (leading and lagging).
Telomeres: Repetitive sequences at chromosome ends (e.g., TTAGGG in humans); Shorten with every division unless maintained by telomerase.
Disruptions in Regulation Linked to Cancer:
Proto-oncogenes: Normal genes that promote growth; When mutated result in oncogenes (Ras, Myc, HER2).
Tumor suppressor genes: Regulate checkpoints, repair DNA, or initiate apoptosis; Loss-of-function mutations remove cell cycle brakes (p53, RB, BRCA1/2).
Mutations in DNA repair machinery: Lead to accumulated damage and mutation burden; Mismatch repair defects result in microsatellite instability.
Clinical Implications and Therapeutic Targets:
Many cancer drugs target the cell cycle: CDK inhibitors and Microtubule inhibitors.
Checkpoint loss can contribute to resistance or aggressive phenotypes.
Sources and Types of DNA Damage:
Endogenous: Reactive oxygen species (ROS), replication errors, spontaneous base deamination.
Exogenous: UV radiation (thymine dimers), ionizing radiation (SSBs/DSBs), Chemicals and toxins.
DNA Repair Pathways:
Direct Reversal: Photolyases in bacteria repair UV-induced dimers using light; MGMT reverses alkylation damage directly.
Base Excision Repair (BER): Repairs small, non-distorting lesions; Key enzymes: DNA glycosylases, AP endonuclease, DNA polymerase β, and ligase.
Nucleotide Excision Repair (NER): Removes bulky distortions like thymine dimers; Involves DNA helicase, endonucleases, and gap-filling synthesis.
Mismatch Repair (MMR): Fixes replication errors; Key proteins: MSH2, MLH1, and their complexes.
Double-Strand Break Repair:
Non-Homologous End Joining (NHEJ): Quick but error-prone; DNA ends are trimmed and ligated.
Homologous Recombination (HR): High-fidelity; uses a sister chromatid as template; Key players: BRCA1, BRCA2, RAD51.
Clinical Syndromes Caused by Defective Repair:
Xeroderma Pigmentosum (XP): NER deficiency -> extreme sensitivity to sunlight, high risk of skin cancer.
Lynch Syndrome (HNPCC): MMR gene mutations -> microsatellite instability and colorectal/endometrial cancers.
Ataxia Telangiectasia: Defect in ATM kinase (DSB sensing) -> immunodeficiency, cerebellar ataxia, radiation sensitivity.
BRCA1/2 mutations: Impair HR repair -> increased risk of breast, ovarian, and prostate cancers.
Connecting DNA Repair to Cancer Treatment Strategies:
Cancer therapies exploit defective repair mechanisms via synthetic lethality.
PARP inhibitors: Inhibit base excision repair; Tumor cells with BRCA mutations rely heavily on BER -> Inhibiting BER causes collapse of replication forks.
Chemotherapies and radiation: Cause DNA breaks or crosslinks; More toxic to rapidly dividing cells but also harm normal proliferative tissues.
Origins and Natural Role of CRISPR-Cas Systems:
Discovered as a bacterial immune system.
Bacteria capture viral DNA snippets and integrate them into their own genome as spacer sequences.
If reinfected, bacteria transcribe these spacers into crRNAs, which guide the Cas9 protein to recognize and cleave matching viral DNA.
This natural system was engineered into a gene-editing tool.
How CRISPR-Cas9 Performs Genome Editing:
Essential components: Cas9 nuclease and Single-guide RNA (sgRNA).
Mechanism: sgRNA binds a ~20-nucleotide DNA sequence adjacent to a PAM; Cas9 induces a double-stranded break (DSB). Cell then repairs via Non-Homologous End Joining (NHEJ) or Homology-Directed Repair (HDR).
Comparison of CRISPR to Older Editing Tools (ZFNs and TALENs):
CRISPR’s RNA-guided targeting makes it more flexible and scalable.
Advanced CRISPR Variants:
Base Editing: Uses deactivated Cas9 (dCas9) fused to a deaminase enzyme. Converts one base to another without cutting the DNA.
Prime Editing: Combines Cas9 nickase and reverse transcriptase. Enables “search-and-replace” editing.
CRISPR Interference/Activation (CRISPRi/a): dCas9 fused to repressors or activators can modulate gene expression without altering sequence.
Applications in Research and Medicine:
Basic science: Create knockouts to study gene function; Generate disease models.
Clinical trials and therapies (sickle cell disease, cancer immunotherapy, retinal disease, HIV).
Gene drives (mosquito control).
Ethical and Safety Considerations:
Off-target effects: Cas9 may cleave similar sequences, potentially introducing unintended mutations.
Germline editing: Heritable changes raise ethical concerns.
Purpose and Foundation of Genetic Epidemiology:
Studies how genetic factors contribute to disease prevalence and trait variation in populations.
Integrates population genetics, disease risk modeling, and inheritance patterns.
Core concepts: Penetrance, Expressivity, Heritability (h^2).
Study Designs: Linkage, Association, Candidate Gene vs. GWAS
Linkage Analysis: Families/pedigree -> Mendelian traits -> Broad (~Mb scale) -> Few markers.
Association Studies (incl. GWAS): Unrelated individuals (case-control) -> Complex/multifactorial traits -> Fine (~kb scale) -> High-density genome-wide SNP arrays.
Candidate gene approach: Focus on specific gene(s) suspected to influence a trait.
GWAS: Hypothesis-free scan of the genome.
GWAS Methodology:
Sample selection (cases vs. controls), Genotyping (SNP microarrays), Statistical testing (logistic regression or chi-squared tests), and Multiple testing correction (Bonferroni correction or False Discovery Rate (FDR); p < 5 × 10⁻⁸).
Interpreting GWAS Results and Visualizations:
Manhattan plot: X-axis shows chromosomal position; Y-axis shows –log₁₀(p-value).
QQ plot: Compares expected vs. observed p-values.
Population Stratification and Bias:
Stratification = allele frequency differences due to ancestry, not trait. Corrected using principal component analysis (PCA)
Post-GWAS Interpretation and Tools:
Fine mapping, eQTL analysis, and Functional annotation.
Clinical and Research Applications of GWAS:
Identification of risk alleles, Development of polygenic risk scores (PRS), and Discovery of therapeutic targets.
Unit’s Learning Outcomes Holistically:
Characterize and manipulate DNA using PCR, cloning, sequencing, and CRISPR.
Apply bioinformatics tools to analyze sequences.
Integrate wet lab and dry lab skills.
Understand molecular genetics in a disease context.
Use data analytics to extract insights from sequence datasets.
Core Experimental Techniques & Their Logic (PCR, Cloning, Sanger, NGS, CRISPR). Principle & Purpose -> Interpreting Results
Master Bioinformatics Workflows & Databases (BLAST, Ensembl/NCBI/UCSC, GeneCards & OMIM, etc.)
Revise Data Interpretation from Lab-Based Workshops (qPCR plot, Microarray heatmap, GWAS Manhattan plot, CRISPR edit confirmation, etc).
Exam strategy tips:
Expect a mix of MCQs and short answer questions.
Scenario-style questions may require: diagnosing experimental errors, designing PCR primers, choosing best sequencing approach, interpreting output plots or sequences.
Study hacks: concept maps, flashcards, practice explaining terms aloud.