EW

Chapter 8 Short Tandem Repeat Markers - Vocabulary Flashcards

STR Markers for Forensic DNA Typing — Comprehensive Notes

  • Overview and context

    • Repeats are pervasive in eukaryotic genomes. STRs (short tandem repeats) are 2–7 bp core repeat units, also called microsatellites or SSRs. They are abundant and scattered throughout the genome.

    • Although >99.7% of the human genome is identical across individuals, variation exists in the remaining 0.3%, enabling individuals to be distinguished genetically.

    • Much of the genome contains repeated DNA sequences located between genes; these can vary in size without typically impacting genetic health.

    • Satellite DNA (long repeats) often surrounds centromeres; the term “satellite” comes from early density gradient centrifugation experiments that revealed minor satellite bands.

    • Core repeats vary in length and copy number, giving rise to different classes of repeats:

    • Minisatellites (VNTRs): core repeat ~8–100 bases; e.g., D1S80 is a minisatellite with a 16-bp repeat unit.

    • Microsatellites (STRs/SSRs): core repeats 2–7 bp; the primary focus for forensic DNA typing due to easy PCR amplification and multiallelism.

    • STRs are favored in forensic work because the alleles from a heterozygote are typically close in size (repeat numbers are small), reducing allelic dropout and PCR bias.

    • The human genome contains thousands of polymorphic microsatellites; estimates suggest ~thousands to over a million microsatellite loci depending on counting method.

    • STRs account for roughly 3\% of the human genome. On average, STR markers occur every 10{,}000 nucleotides.

  • How STR markers are discovered and used

    • Large-scale cataloging of STRs has been enabled by reference genome assemblies and databases.

    • New STR markers are identified by two main methods:
      1) Database searches (e.g., GenBank) for regions with multiple contiguous repeat units (>6 or so).
      2) Molecular biology isolation techniques to identify repeat regions.

    • With a human reference genome, more than 20{,}000\ tetranucleotide STR repeats have been located.

    • To analyze an STR marker, invariant flanking regions must be identified so PCR primers can amplify the repeat region (see Figure 8.1).

    • PCR primers flank the repeat region in unique, consistent sequences to allow reliable amplification across individuals (Figure 8.1).

    • Practical benefit: STR loci can be multiplexed for simultaneous amplification (high-throughput typing).

  • Key concepts in STR motif and nomenclature

    • STR motifs are named by the core repeat unit length: mono-, di-, tri-, tetra-, penta-, hexanucleotide repeats.

    • Theoretically possible motif counts (before equivalence considerations):

    • 4,\ 16,\ 64,\ 256,\ 1024,\ 4096
      for mono-, di-, tri-, tetra-, penta-, and hexanucleotide repeats, respectively.

    • Equivalence of motifs arises because of repetitive structure and strand orientation; some motifs are effectively the same motif on the opposite strand or by frameshift.

    • Two rules identify motif equivalence:

    • (1) Motif A is equivalent to motif B if A is inversely complementary to B.

    • (2) Motif A is equivalent to motif B if A differs from B or the inversely complementary sequence of B by a frameshift.

    • Example: (GAAA)n is equivalent to (AGAA)n, (AAGA)n, (AAAG)n, (TTTC)n, (TTCT)n, (TCTT)n, (CTTT)n.

    • Important note: In some contexts, (AGAG)n is considered a dinucleotide repeat rather than a tetranucleotide motif, illustrating how equivalence can affect classification.

    • Because DNA is double-stranded, repeat motifs can be read on either strand, leading to different apparent repeat motifs and starting positions.

    • ISO/ISFG guidance: use the top strand from GenBank, so motif designation is standardized in cross-lab reporting (e.g., an ARR labeled motif is given as [TCAT] in the example).

    • Repeat numbering proceeds 5′-to-3′ along the sequence.

  • STR repeat patterns and allele complexity

    • STRs vary not only by repeat unit length and copy number but also by how strictly they adhere to an incremental repeat pattern.

    • Classification of STR patterns (important for interpretation):

    • Simple repeats: identical unit length and sequence repeated.

    • Compound repeats: two or more adjacent simple repeats.

    • Complex repeats: multiple blocks of varying unit lengths and potential intervening sequences.

    • Complex hypervariable repeats: numerous nonconsensus alleles with variation in size and sequence; difficult to genotype reproducibly and less commonly used in standard forensic typing. Some kits include SE33 (ACTBP2), a complex hypervariable locus.

    • Not all alleles at an STR locus contain complete repeat units. Variants without full repeats include:

    • Microvariants: alleles with incomplete repeat units (e.g., TH01 allele 9.3, which has nine tetranucleotide repeats plus one incomplete repeat of three nucleotides due to a missing base in the seventh repeat).

    • The practical consequences: microvariants and nonconsensus alleles must be carefully named and interpreted in the lab to ensure interlaboratory concordance.

  • STRs in forensic identification: locus variation and performance

    • A major objective is to maximize discriminatory power by selecting highly polymorphic markers and combining several markers to improve individual discrimination.

    • In degraded samples, short amplicons (miniSTRs) perform better because they amplify smaller fragments more efficiently; typical STR alleles are ~100{-}{400} bp, while minisatellites are ~400{-}{1000} bp.

    • Stutter artifacts: PCR slippage produces amplicons typically one repeat unit shorter than the true allele. Stutter levels depend on the repeat unit length:

    • Tetranucleotide repeats: usually < 15\% of the main allele peak.

    • Di- and trinucleotide repeats: can be 30\% or higher, complicating mixed-profile interpretation.

    • The smaller amplicon size of STRs aids in distinguishing closely spaced heterozygotes on capillary electrophoresis because the four-base (or smaller) spread can be easier to resolve than larger indels.

  • Core STR loci and CODIS steering principles

    • The concept of a core marker set emerged to provide a standardized set of loci for national DNA databases.

    • In the U.S., CODIS core loci development involved a 1996–1997 effort with 22 laboratories evaluating 17 candidate loci; 13 core loci were selected as the basis for the national DNA database (CODIS).

    • The 13 CODIS core loci (with amelogenin for sex typing) are:

    • CSF1PO, FGA, TH01, TPOX, VWA, D3S1358, D5S818, D7S820, D8S1179, D13S317, D16S539, D18S51, D21S11

    • These loci, together with amelogenin, provide very high discriminative power; when all 13 CODIS loci are typed, the random match probability is far less than 1 in 10^{12} for unrelated individuals, enabling robust identity testing.

    • The 13 CODIS loci have been supplemented by commercial kits that allow robust, multiplex amplification and simultaneous detection of all core loci along with amelogenin and sometimes additional loci for internal checks.

    • The 2000s also saw European and other international efforts to standardize loci; by 2006, several European loci were recommended for inclusion in future typing kits (e.g., D2S441, D10S1248, D22S1045, D1S1656, D12S391).

  • Allelic ladders: purpose, construction, and use in genotyping

    • An allelic ladder is an artificial mixture of common alleles for a given STR locus, generated with the same primers as the samples, providing a size reference for each allele.

    • Ladders are essential for accurate genotype determination because they standardize allele sizing across instruments and conditions.

    • Construction of ladders:

    • Combine genomic DNA or locus-specific PCR products from multiple individuals to include representative alleles (e.g., alleles with repeats 6, 7, 8, 9, 10).

    • Balance the quantities so alleles are fairly evenly represented in the ladder.

    • Example ladder composition strategies: (6,8), (7,10), (9,9); or (6,6), (7,7), (8,8), (9,9), (10,10).

    • Ladders are included in commercial STR kits, and additional ladders can be produced by diluting the original ladder and reamplifying with the same primers (second- and third-generation ladders).

    • It is critical that ladders use the same PCR