DNA-Based Species Identification – Comprehensive Study Notes

Introduction / Conceptual Foundations

"Species" definition is contentious; >20 definitions exist.
- Researchers must understand phylogeny (branching order, divergence times) and nomenclature before applying molecular tools.
- Terms such as “strain”, “variant”, “sub-species”, “breed” are often subjective and overlapping.
All DNA- or protein-based identifications rest on the Neutral Theory of Molecular Evolution (Kimura, $1968$ ):
- Molecular changes accumulate over time, most being selectively neutral.
- Assumption: individuals of the same species share diagnostic sequences absent from others.
- Caveats: reproductive success, gene flow, drift ➔ continuous intraspecies variation.
- Essential prerequisite: ensure no overlap between intra- and inter-specific diversity at chosen locus.
Locus choice matters because mutation, recombination, selection alter evolutionary rates.
- Prefer multiple loci or the most informative locus for the question.
When employing a novel marker:
- Genotype representatives of all candidate species from multiple geographies/hosts.
- Deposit voucher specimens.
No method is perfect; choice depends on:
- Laboratory resources, expertise, cost, time, sample quality, research question.

Pre-DNA World: Morphology & Protein Analyses

Classical taxonomy relies on external & internal morphology.
- Strengths: direct, inexpensive, still the most used.
- Weaknesses:
- Phenotypic plasticity (e.g., color variation in birds/fish due to diet).
- Sibling/cryptic species complexes—morphologically indistinguishable yet reproductively isolated.
- Convergent evolution causes unrelated taxa to share traits ➔ mis-ID if few characters examined.
- Requires intact specimens—impractical for trace/forensic remains.
Protein-based methods (electrophoresis, isoenzymes, immunoassays):
- Limited by rapid degradation, tissue-specific expression, cross-reactivity, antibody availability.

The “DNA Revolution”

DNA advantages for species ID:
- Chemical stability; recoverable from degraded sources (processed food, coprolites, mummies).
- Present in virtually all tissues/fluids; mtDNA/plastids circumvent lack of nuclei.
- Carries more variation than proteins (genetic code degeneracy, non-coding regions).
RNA viruses (e.g., HIV) still typed via DNA after reverse transcription.

Core DNA-Based Typing Approaches

Comparative Attributes (Table 1 Synopsis)

Information, reproducibility, throughput, cost differ among methods.
General trends:
- Sequencing & microarrays: highest information, high cost, good reproducibility, excellent throughput.
- RAPD: quick, cheap, but poor inter-lab reproducibility.
- Real-time PCR & microarray: mixture-friendly, automation-ready.

1. DNA Hybridization

Principle: complementary strands anneal; detection via fluorescent/radio labels.
Classical membrane hybridizations: multi-species detection possible with multiple probes but:
- Require high-quality DNA; cross-hybridization; lab-to-lab variability; time-consuming.
FISH (Fluorescence in situ Hybridization): probes bind rRNA/DNA in whole cells ➔ direct visualization in microbial consortia.
- PNA (peptide nucleic acid) probes: neutral backbone ➔ higher Tm, better specificity; expensive.
- LNA (locked nucleic acid) analogues: ribose “locked” ➔ increased affinity, producible on standard synthesizers.
Suspension array (Luminex xMAP): 100 spectrally coded microsphere sets + flow cytometer read-out.
- Enables multiplex detection of bacteria, viruses, fungi; discriminates close relatives.
Hybridization underlies LiPA, HPA, microarrays, real-time PCR probes.

2. Restriction Fragment Length Polymorphism (RFLP)

Digest genomic/mtDNA with 4–6 bp restriction enzymes ➔ species-specific band pattern.
Visualization: Southern blot, ethidium bromide, silver stain.
PCR-RFLP emerged after discovery of PCR (Mullis $1983$ ) ➔ accommodates low DNA by amplifying target (often mt cytochrome b, rRNA genes).
Limitations:
- Intraspecies mutations at restriction sites ➔ false negatives/positives.
- Multiple enzymes needed, yielding complex patterns.
- Needs high-quality DNA unless coupled to Whole Genome Amplification (WGA).
- Multicopy genes & heteroplasmy complicate interpretation.

3. Amplified Fragment Length Polymorphism (AFLP)

Workflow:
1. Digest DNA with two enzymes (e.g., EcoRI/MseI).
2. Ligate adapters to sticky ends.
3. Two rounds of selective PCR with primers bearing 1–3 selective 3′ bases.
4. Separate fragments on denaturing PAGE.
Screens hundreds of anonymous loci; high discriminatory power.
Drawbacks: labor-intensive, costly software, complex data.

4. Random Amplified Polymorphic DNA (RAPD)

Uses a single arbitrary 9–10 nt primer at low annealing T.
- Bands form when two binding sites are in inverse orientation within a few kb.
Variants target repetitive elements: MIR, REP, ERIC.
Pros: no prior sequence data; fast.
Cons: sensitive to PCR conditions; poor reproducibility; needs high-molecular-weight DNA; difficult with mixtures.

5. Conventional PCR with Species-Specific Primers

Design primers unique to target species (software + sequence databases).
- Multiplex PCR: multiple primer pairs in one tube; conserves time & DNA.
Nested PCR (two sequential primer sets) boosts specificity but raises contamination risk and cost.
Limitation: negative result is inconclusive for non-target species; electrophoresis step required.

Isothermal Nucleic-Acid Amplification (PCR alternatives)

NASBA, SDA, RCA, LAMP, HDA, etc. — operate at constant T.
Advantages: simpler hardware, lower contamination risk.
Disadvantages: long reaction times (8–16 h), sensitivity to inhibitors, non-specific products.

6. Real-Time (Quantitative) PCR

Monitors fluorescence during amplification, enabling quantitation and closed-tube diagnostics.
Probe chemistries:
- Hydrolysis (TaqMan) probes: reporter/quencher; cleavage releases fluorescence.
- Molecular beacons: stem-loop opens upon hybridization.
- DNA dyes (SYBR Green), PNA light-up probes, hybridization probes, etc.
Strengths: high sensitivity, reduced contamination, automation; species-specific probes detect mixtures.
Weaknesses: platform/dye compatibility limits multiplex; high equipment & reagent cost.

7. DNA Sequencing Approaches

Gold standard for species ID; costs dropping (micro-electrophoretic, cyclic-array, etc.).
Identification requires comparison to curated databases.
- Critical quality checks: verified voucher IDs, multiple geographic replicates, coverage of close relatives.
Popular loci:
- Ribosomal genes: 16S/18S conserve + ITS variable ➔ broad taxonomic range.
- Animal mtDNA: cytochrome b, COI. • COI “DNA barcoding” (650–750 bp) proposed as universal animal barcode; supports species discovery, but issues:
  - Single-gene limitations (recombination, introgression, incomplete lineage sorting).
  - Not applicable to taxa lacking mtDNA.
Multilocus Sequence Analysis (MLSA): concatenated housekeeping genes; mitigates single-locus pitfalls, detects recombination.
Technical hurdles:
- Amplicons > $300$ bp difficult from degraded DNA.
- WGA may help but yields small fragments from poor samples.
- Single-locus failure (e.g., primer mismatch) ➔ use degenerate primers or multiple loci.
- Mixed-species samples require cloning or next-gen sequencing to deconvolute.

8. DNA Microarrays

Thousands to $10^6$ immobilized probes per 1–2 cm$^2$ chip.
Formats: glass slide, silicon, nylon; probes = oligos, amplicons, PNA, molecular beacons.
Workflow: fluorescently label sample DNA ➔ hybridize ➔ laser scan ➔ bioinformatic decoding.
Utility: high-throughput species signature detection or SNP scanning.
Limitations: expensive robotics/imagers; sophisticated data analysis.

Auxiliary Knowledge Boxes (Embedded in Article)

Mitochondrial DNA (mtDNA): high copy number, maternal inheritance, high mutation rate (animals) ➔ ideal for degraded forensic samples; plant mtDNA evolves slowly & contains non-coding bulk.
Polymerase Chain Reaction (PCR): exponential, in vitro amplification; contamination risk; primer design requires prior sequence.
Ribosomal RNA Genes: multi-copy clusters; ETS–18S–ITS–5.8S–ITS–28S (eukaryotes); 5S separate; 16S–ITS–23S–5S (prokaryotes).
Whole Genome Amplification (WGA): increases DNA quantity using PCR or isothermal methods; issues with allelic dropout and background synthesis in low-quality samples.
Online Resources: BLAST (NCBI), BOLD, GBIF, Tree-of-Life, IPNI, MycoBank, patent databases (Espacenet, WIPO, Google Patents).

Practical Criteria for Method Selection

Sample state: quantity, degradation, presence of mixtures.
Discriminatory depth required (strain vs. species vs. genus).
Laboratory budget & equipment.
Time constraints.
Need for throughput / automation.
Regulatory or forensic admissibility.

Current & Emerging Technologies / Future Directions

Two megatrends: miniaturization & high-throughput.
Multiplex SNP genotyping: minisequencing, primer-extension microarrays, coded microspheres, SCOLA.
Portable diagnostics aspirations:
- Combine isothermal amplification + microfluidics + nanotech.
- Nanobiotechnology innovations:
  • Carbon nanotube & quantum-dot probes enable single-molecule polymorphism reading.
  • Atomic Force Microscopy with nanotube tips directly haplotypes kilobase DNA.
- Electrochemical biosensors translate hybridization into electronic signals — potential point-of-care devices.
- Optical mapping: immobilize single DNA molecules, create ordered restriction maps ➔ genome-scale species signatures without PCR.

Ethical, Philosophical & Practical Implications

Rapid DNA ID influences biodiversity conservation, invasive-species monitoring, disease control, and forensic justice.
But greater access ≠ greater understanding; proper population-genetic frameworks remain essential.
Data quality, database curation, and cross-disciplinary literacy determine real-world usefulness.