DNA-Based Species Identification – Comprehensive Study Notes
Introduction / Conceptual Foundations
"Species" definition is contentious; >20 definitions exist.
Researchers must understand phylogeny (branching order, divergence times) and nomenclature before applying molecular tools.
Terms such as “strain”, “variant”, “sub-species”, “breed” are often subjective and overlapping.
All DNA- or protein-based identifications rest on the Neutral Theory of Molecular Evolution (Kimura, ):
Molecular changes accumulate over time, most being selectively neutral.
Assumption: individuals of the same species share diagnostic sequences absent from others.
Caveats: reproductive success, gene flow, drift ➔ continuous intraspecies variation.
Essential prerequisite: ensure no overlap between intra- and inter-specific diversity at chosen locus.
Locus choice matters because mutation, recombination, selection alter evolutionary rates.
Prefer multiple loci or the most informative locus for the question.
When employing a novel marker:
Genotype representatives of all candidate species from multiple geographies/hosts.
Deposit voucher specimens.
No method is perfect; choice depends on:
Laboratory resources, expertise, cost, time, sample quality, research question.
Pre-DNA World: Morphology & Protein Analyses
Classical taxonomy relies on external & internal morphology.
Strengths: direct, inexpensive, still the most used.
Weaknesses:
Phenotypic plasticity (e.g., color variation in birds/fish due to diet).
Sibling/cryptic species complexes—morphologically indistinguishable yet reproductively isolated.
Convergent evolution causes unrelated taxa to share traits ➔ mis-ID if few characters examined.
Requires intact specimens—impractical for trace/forensic remains.
Protein-based methods (electrophoresis, isoenzymes, immunoassays):
Limited by rapid degradation, tissue-specific expression, cross-reactivity, antibody availability.
The “DNA Revolution”
DNA advantages for species ID:
Chemical stability; recoverable from degraded sources (processed food, coprolites, mummies).
Present in virtually all tissues/fluids; mtDNA/plastids circumvent lack of nuclei.
Carries more variation than proteins (genetic code degeneracy, non-coding regions).
RNA viruses (e.g., HIV) still typed via DNA after reverse transcription.
Core DNA-Based Typing Approaches
Comparative Attributes (Table 1 Synopsis)
Information, reproducibility, throughput, cost differ among methods.
General trends:
Sequencing & microarrays: highest information, high cost, good reproducibility, excellent throughput.
RAPD: quick, cheap, but poor inter-lab reproducibility.
Real-time PCR & microarray: mixture-friendly, automation-ready.
1. DNA Hybridization
Principle: complementary strands anneal; detection via fluorescent/radio labels.
Classical membrane hybridizations: multi-species detection possible with multiple probes but:
Require high-quality DNA; cross-hybridization; lab-to-lab variability; time-consuming.
FISH (Fluorescence in situ Hybridization): probes bind rRNA/DNA in whole cells ➔ direct visualization in microbial consortia.
PNA (peptide nucleic acid) probes: neutral backbone ➔ higher Tm, better specificity; expensive.
LNA (locked nucleic acid) analogues: ribose “locked” ➔ increased affinity, producible on standard synthesizers.
Suspension array (Luminex xMAP): 100 spectrally coded microsphere sets + flow cytometer read-out.
Enables multiplex detection of bacteria, viruses, fungi; discriminates close relatives.
Hybridization underlies LiPA, HPA, microarrays, real-time PCR probes.
2. Restriction Fragment Length Polymorphism (RFLP)
Digest genomic/mtDNA with 4–6 bp restriction enzymes ➔ species-specific band pattern.
Visualization: Southern blot, ethidium bromide, silver stain.
PCR-RFLP emerged after discovery of PCR (Mullis ) ➔ accommodates low DNA by amplifying target (often mt cytochrome b, rRNA genes).
Limitations:
Intraspecies mutations at restriction sites ➔ false negatives/positives.
Multiple enzymes needed, yielding complex patterns.
Needs high-quality DNA unless coupled to Whole Genome Amplification (WGA).
Multicopy genes & heteroplasmy complicate interpretation.
3. Amplified Fragment Length Polymorphism (AFLP)
Workflow:
Digest DNA with two enzymes (e.g., EcoRI/MseI).
Ligate adapters to sticky ends.
Two rounds of selective PCR with primers bearing 1–3 selective 3′ bases.
Separate fragments on denaturing PAGE.
Screens hundreds of anonymous loci; high discriminatory power.
Drawbacks: labor-intensive, costly software, complex data.
4. Random Amplified Polymorphic DNA (RAPD)
Uses a single arbitrary 9–10 nt primer at low annealing T.
Bands form when two binding sites are in inverse orientation within a few kb.
Variants target repetitive elements: MIR, REP, ERIC.
Pros: no prior sequence data; fast.
Cons: sensitive to PCR conditions; poor reproducibility; needs high-molecular-weight DNA; difficult with mixtures.
5. Conventional PCR with Species-Specific Primers
Design primers unique to target species (software + sequence databases).
Multiplex PCR: multiple primer pairs in one tube; conserves time & DNA.
Nested PCR (two sequential primer sets) boosts specificity but raises contamination risk and cost.
Limitation: negative result is inconclusive for non-target species; electrophoresis step required.
Isothermal Nucleic-Acid Amplification (PCR alternatives)
NASBA, SDA, RCA, LAMP, HDA, etc. — operate at constant T.
Advantages: simpler hardware, lower contamination risk.
Disadvantages: long reaction times (8–16 h), sensitivity to inhibitors, non-specific products.
6. Real-Time (Quantitative) PCR
Monitors fluorescence during amplification, enabling quantitation and closed-tube diagnostics.
Probe chemistries:
Hydrolysis (TaqMan) probes: reporter/quencher; cleavage releases fluorescence.
Molecular beacons: stem-loop opens upon hybridization.
DNA dyes (SYBR Green), PNA light-up probes, hybridization probes, etc.
Strengths: high sensitivity, reduced contamination, automation; species-specific probes detect mixtures.
Weaknesses: platform/dye compatibility limits multiplex; high equipment & reagent cost.
7. DNA Sequencing Approaches
Gold standard for species ID; costs dropping (micro-electrophoretic, cyclic-array, etc.).
Identification requires comparison to curated databases.
Critical quality checks: verified voucher IDs, multiple geographic replicates, coverage of close relatives.
Popular loci:
Ribosomal genes: 16S/18S conserve + ITS variable ➔ broad taxonomic range.
Animal mtDNA: cytochrome b, COI. • COI “DNA barcoding” (650–750 bp) proposed as universal animal barcode; supports species discovery, but issues:
Single-gene limitations (recombination, introgression, incomplete lineage sorting).
Not applicable to taxa lacking mtDNA.
Multilocus Sequence Analysis (MLSA): concatenated housekeeping genes; mitigates single-locus pitfalls, detects recombination.
Technical hurdles:
Amplicons > bp difficult from degraded DNA.
WGA may help but yields small fragments from poor samples.
Single-locus failure (e.g., primer mismatch) ➔ use degenerate primers or multiple loci.
Mixed-species samples require cloning or next-gen sequencing to deconvolute.
8. DNA Microarrays
Thousands to immobilized probes per 1–2 cm$^2$ chip.
Formats: glass slide, silicon, nylon; probes = oligos, amplicons, PNA, molecular beacons.
Workflow: fluorescently label sample DNA ➔ hybridize ➔ laser scan ➔ bioinformatic decoding.
Utility: high-throughput species signature detection or SNP scanning.
Limitations: expensive robotics/imagers; sophisticated data analysis.
Auxiliary Knowledge Boxes (Embedded in Article)
Mitochondrial DNA (mtDNA): high copy number, maternal inheritance, high mutation rate (animals) ➔ ideal for degraded forensic samples; plant mtDNA evolves slowly & contains non-coding bulk.
Polymerase Chain Reaction (PCR): exponential, in vitro amplification; contamination risk; primer design requires prior sequence.
Ribosomal RNA Genes: multi-copy clusters; ETS–18S–ITS–5.8S–ITS–28S (eukaryotes); 5S separate; 16S–ITS–23S–5S (prokaryotes).
Whole Genome Amplification (WGA): increases DNA quantity using PCR or isothermal methods; issues with allelic dropout and background synthesis in low-quality samples.
Online Resources: BLAST (NCBI), BOLD, GBIF, Tree-of-Life, IPNI, MycoBank, patent databases (Espacenet, WIPO, Google Patents).
Practical Criteria for Method Selection
Sample state: quantity, degradation, presence of mixtures.
Discriminatory depth required (strain vs. species vs. genus).
Laboratory budget & equipment.
Time constraints.
Need for throughput / automation.
Regulatory or forensic admissibility.
Current & Emerging Technologies / Future Directions
Two megatrends: miniaturization & high-throughput.
Multiplex SNP genotyping: minisequencing, primer-extension microarrays, coded microspheres, SCOLA.
Portable diagnostics aspirations:
Combine isothermal amplification + microfluidics + nanotech.
Nanobiotechnology innovations:
• Carbon nanotube & quantum-dot probes enable single-molecule polymorphism reading.
• Atomic Force Microscopy with nanotube tips directly haplotypes kilobase DNA.Electrochemical biosensors translate hybridization into electronic signals — potential point-of-care devices.
Optical mapping: immobilize single DNA molecules, create ordered restriction maps ➔ genome-scale species signatures without PCR.
Ethical, Philosophical & Practical Implications
Rapid DNA ID influences biodiversity conservation, invasive-species monitoring, disease control, and forensic justice.
But greater access ≠ greater understanding; proper population-genetic frameworks remain essential.
Data quality, database curation, and cross-disciplinary literacy determine real-world usefulness.