DNA-Based Species Identification – Comprehensive Study Notes

Introduction / Conceptual Foundations

  • "Species" definition is contentious; >20 definitions exist.

    • Researchers must understand phylogeny (branching order, divergence times) and nomenclature before applying molecular tools.

    • Terms such as “strain”, “variant”, “sub-species”, “breed” are often subjective and overlapping.

  • All DNA- or protein-based identifications rest on the Neutral Theory of Molecular Evolution (Kimura, 19681968):

    • Molecular changes accumulate over time, most being selectively neutral.

    • Assumption: individuals of the same species share diagnostic sequences absent from others.

    • Caveats: reproductive success, gene flow, drift ➔ continuous intraspecies variation.

    • Essential prerequisite: ensure no overlap between intra- and inter-specific diversity at chosen locus.

  • Locus choice matters because mutation, recombination, selection alter evolutionary rates.

    • Prefer multiple loci or the most informative locus for the question.

  • When employing a novel marker:

    • Genotype representatives of all candidate species from multiple geographies/hosts.

    • Deposit voucher specimens.

  • No method is perfect; choice depends on:

    • Laboratory resources, expertise, cost, time, sample quality, research question.

Pre-DNA World: Morphology & Protein Analyses

  • Classical taxonomy relies on external & internal morphology.

    • Strengths: direct, inexpensive, still the most used.

    • Weaknesses:

    • Phenotypic plasticity (e.g., color variation in birds/fish due to diet).

    • Sibling/cryptic species complexes—morphologically indistinguishable yet reproductively isolated.

    • Convergent evolution causes unrelated taxa to share traits ➔ mis-ID if few characters examined.

    • Requires intact specimens—impractical for trace/forensic remains.

  • Protein-based methods (electrophoresis, isoenzymes, immunoassays):

    • Limited by rapid degradation, tissue-specific expression, cross-reactivity, antibody availability.

The “DNA Revolution”

  • DNA advantages for species ID:

    • Chemical stability; recoverable from degraded sources (processed food, coprolites, mummies).

    • Present in virtually all tissues/fluids; mtDNA/plastids circumvent lack of nuclei.

    • Carries more variation than proteins (genetic code degeneracy, non-coding regions).

  • RNA viruses (e.g., HIV) still typed via DNA after reverse transcription.

Core DNA-Based Typing Approaches

Comparative Attributes (Table 1 Synopsis)

  • Information, reproducibility, throughput, cost differ among methods.

  • General trends:

    • Sequencing & microarrays: highest information, high cost, good reproducibility, excellent throughput.

    • RAPD: quick, cheap, but poor inter-lab reproducibility.

    • Real-time PCR & microarray: mixture-friendly, automation-ready.

1. DNA Hybridization

  • Principle: complementary strands anneal; detection via fluorescent/radio labels.

  • Classical membrane hybridizations: multi-species detection possible with multiple probes but:

    • Require high-quality DNA; cross-hybridization; lab-to-lab variability; time-consuming.

  • FISH (Fluorescence in situ Hybridization): probes bind rRNA/DNA in whole cells ➔ direct visualization in microbial consortia.

    • PNA (peptide nucleic acid) probes: neutral backbone ➔ higher Tm, better specificity; expensive.

    • LNA (locked nucleic acid) analogues: ribose “locked” ➔ increased affinity, producible on standard synthesizers.

  • Suspension array (Luminex xMAP): 100 spectrally coded microsphere sets + flow cytometer read-out.

    • Enables multiplex detection of bacteria, viruses, fungi; discriminates close relatives.

  • Hybridization underlies LiPA, HPA, microarrays, real-time PCR probes.

2. Restriction Fragment Length Polymorphism (RFLP)

  • Digest genomic/mtDNA with 4–6 bp restriction enzymes ➔ species-specific band pattern.

  • Visualization: Southern blot, ethidium bromide, silver stain.

  • PCR-RFLP emerged after discovery of PCR (Mullis 19831983) ➔ accommodates low DNA by amplifying target (often mt cytochrome b, rRNA genes).

  • Limitations:

    • Intraspecies mutations at restriction sites ➔ false negatives/positives.

    • Multiple enzymes needed, yielding complex patterns.

    • Needs high-quality DNA unless coupled to Whole Genome Amplification (WGA).

    • Multicopy genes & heteroplasmy complicate interpretation.

3. Amplified Fragment Length Polymorphism (AFLP)

  • Workflow:

    1. Digest DNA with two enzymes (e.g., EcoRI/MseI).

    2. Ligate adapters to sticky ends.

    3. Two rounds of selective PCR with primers bearing 1–3 selective 3′ bases.

    4. Separate fragments on denaturing PAGE.

  • Screens hundreds of anonymous loci; high discriminatory power.

  • Drawbacks: labor-intensive, costly software, complex data.

4. Random Amplified Polymorphic DNA (RAPD)

  • Uses a single arbitrary 9–10 nt primer at low annealing T.

    • Bands form when two binding sites are in inverse orientation within a few kb.

  • Variants target repetitive elements: MIR, REP, ERIC.

  • Pros: no prior sequence data; fast.

  • Cons: sensitive to PCR conditions; poor reproducibility; needs high-molecular-weight DNA; difficult with mixtures.

5. Conventional PCR with Species-Specific Primers

  • Design primers unique to target species (software + sequence databases).

    • Multiplex PCR: multiple primer pairs in one tube; conserves time & DNA.

  • Nested PCR (two sequential primer sets) boosts specificity but raises contamination risk and cost.

  • Limitation: negative result is inconclusive for non-target species; electrophoresis step required.

Isothermal Nucleic-Acid Amplification (PCR alternatives)
  • NASBA, SDA, RCA, LAMP, HDA, etc. — operate at constant T.

  • Advantages: simpler hardware, lower contamination risk.

  • Disadvantages: long reaction times (8–16 h), sensitivity to inhibitors, non-specific products.

6. Real-Time (Quantitative) PCR

  • Monitors fluorescence during amplification, enabling quantitation and closed-tube diagnostics.

  • Probe chemistries:

    • Hydrolysis (TaqMan) probes: reporter/quencher; cleavage releases fluorescence.

    • Molecular beacons: stem-loop opens upon hybridization.

    • DNA dyes (SYBR Green), PNA light-up probes, hybridization probes, etc.

  • Strengths: high sensitivity, reduced contamination, automation; species-specific probes detect mixtures.

  • Weaknesses: platform/dye compatibility limits multiplex; high equipment & reagent cost.

7. DNA Sequencing Approaches

  • Gold standard for species ID; costs dropping (micro-electrophoretic, cyclic-array, etc.).

  • Identification requires comparison to curated databases.

    • Critical quality checks: verified voucher IDs, multiple geographic replicates, coverage of close relatives.

  • Popular loci:

    • Ribosomal genes: 16S/18S conserve + ITS variable ➔ broad taxonomic range.

    • Animal mtDNA: cytochrome b, COI. • COI “DNA barcoding” (650–750 bp) proposed as universal animal barcode; supports species discovery, but issues:

      • Single-gene limitations (recombination, introgression, incomplete lineage sorting).

      • Not applicable to taxa lacking mtDNA.

  • Multilocus Sequence Analysis (MLSA): concatenated housekeeping genes; mitigates single-locus pitfalls, detects recombination.

  • Technical hurdles:

    • Amplicons >300300 bp difficult from degraded DNA.

    • WGA may help but yields small fragments from poor samples.

    • Single-locus failure (e.g., primer mismatch) ➔ use degenerate primers or multiple loci.

    • Mixed-species samples require cloning or next-gen sequencing to deconvolute.

8. DNA Microarrays

  • Thousands to 10610^6 immobilized probes per 1–2 cm$^2$ chip.

  • Formats: glass slide, silicon, nylon; probes = oligos, amplicons, PNA, molecular beacons.

  • Workflow: fluorescently label sample DNA ➔ hybridize ➔ laser scan ➔ bioinformatic decoding.

  • Utility: high-throughput species signature detection or SNP scanning.

  • Limitations: expensive robotics/imagers; sophisticated data analysis.

Auxiliary Knowledge Boxes (Embedded in Article)

  • Mitochondrial DNA (mtDNA): high copy number, maternal inheritance, high mutation rate (animals) ➔ ideal for degraded forensic samples; plant mtDNA evolves slowly & contains non-coding bulk.

  • Polymerase Chain Reaction (PCR): exponential, in vitro amplification; contamination risk; primer design requires prior sequence.

  • Ribosomal RNA Genes: multi-copy clusters; ETS–18S–ITS–5.8S–ITS–28S (eukaryotes); 5S separate; 16S–ITS–23S–5S (prokaryotes).

  • Whole Genome Amplification (WGA): increases DNA quantity using PCR or isothermal methods; issues with allelic dropout and background synthesis in low-quality samples.

  • Online Resources: BLAST (NCBI), BOLD, GBIF, Tree-of-Life, IPNI, MycoBank, patent databases (Espacenet, WIPO, Google Patents).

Practical Criteria for Method Selection

  • Sample state: quantity, degradation, presence of mixtures.

  • Discriminatory depth required (strain vs. species vs. genus).

  • Laboratory budget & equipment.

  • Time constraints.

  • Need for throughput / automation.

  • Regulatory or forensic admissibility.

Current & Emerging Technologies / Future Directions

  • Two megatrends: miniaturization & high-throughput.

  • Multiplex SNP genotyping: minisequencing, primer-extension microarrays, coded microspheres, SCOLA.

  • Portable diagnostics aspirations:

    • Combine isothermal amplification + microfluidics + nanotech.

    • Nanobiotechnology innovations:
      • Carbon nanotube & quantum-dot probes enable single-molecule polymorphism reading.
      • Atomic Force Microscopy with nanotube tips directly haplotypes kilobase DNA.

    • Electrochemical biosensors translate hybridization into electronic signals — potential point-of-care devices.

    • Optical mapping: immobilize single DNA molecules, create ordered restriction maps ➔ genome-scale species signatures without PCR.

Ethical, Philosophical & Practical Implications

  • Rapid DNA ID influences biodiversity conservation, invasive-species monitoring, disease control, and forensic justice.

  • But greater access ≠ greater understanding; proper population-genetic frameworks remain essential.

  • Data quality, database curation, and cross-disciplinary literacy determine real-world usefulness.