Forensic Genetics – Week 4 Lecture Notes

Topic = STRs and Statistics

STRs: Basics

  • STR stands for Short Tandem Repeats. Also known as microsatellites.

  • Type of size polymorphism: small DNA fragments (2–6 bp repeats) that occur in tandem.

  • Typical total length: generally only 100–400 bp in total.

STRs vs VNTRs

  • STRs (microsatellites)

    • Repeat unit size: 2–6 base pairs.

    • Repeated many times (e.g., 8–20 times).

  • VNTRs (minisatellites; Variable Number Tandem Repeats)

    • Repeat unit size: hundreds of base pairs.

    • Repeated a smaller number of times (illustrative comparison: VNTRs ~repeat unit size hundreds bp; STRs repeat units of 2–6 bp).

Location of STRs

  • Not clustered near telomeres; spread unevenly across the entire length of chromosomes.

  • Located in non-coding DNA generally.

  • Patches of high and low STR content across the genome.

  • Occur via mistakes during DNA replication.

  • Example: Human Chromosome 5 STRs near genes (contextual note in figure).

Types of STRs (repeat motifs by base count)

  • Mononucleotide repeats (single base): A or C (e.g., AGATAAAAAAAAGTGTCA) — not G or T alone.

  • Dinucleotide repeats (2 bases): AC, AG, AT, CG (AC = CA = GT = TG).

  • Trinucleotide repeats (3 bases): AAC, AAG, AAT, ACC, ACG, ACT, AGC, AGG, ATC, CCG.

  • Tetranucleotide repeats (4 bases): examples include AAAC, AAAG, AAAT, AGAT, CCCG, CCGG, etc.

  • Pentanucleotide repeats (5 bases).

  • Hexanucleotide repeats (6 bases).

STR nomenclature difficulties

  • Questions to resolve when describing STRs:

    • Which DNA strand to read from?

    • Which motif is the repeat unit?

    • The chosen strand can affect how many repeats are counted.

  • To standardize, use GenBank reference sequence to satisfy courts.

  • The first strand sequenced is usually treated as the reference sequence.

STR typing steps — PCR

  • Amplify across the STR via PCR.

  • STRs shorter than ~400 bp are useful in forensics due to DNA quality constraints.

  • Flanking regions around STRs are stable across the population, enabling reliable amplification.

STR typing steps — Capillary Electrophoresis (CE)

  • After PCR, run products on a capillary electrophoresis gel.

  • Detection relies on lasers and fluorescence to detect DNA fragments.

  • CE is more accurate than traditional gel electrophoresis and high-throughput: instruments can run 12–96 samples at a time.

  • CE separates DNA by size in a thin capillary tube using an electrical current and a separation matrix.

  • A laser excites fluorescence as fragments pass a detection window; the emitted light is recorded (electropherogram).

CE: Detector and signal concepts

  • Detector measures RFU (Relative Fluorescence Units) as fragments pass by.

  • Electropherogram: graph of signal vs. time showing fragment sizes.

  • The capillary system includes:

    • Capillary tube

    • Buffer and matrix

    • Laser excitation and a fluorescence detector

    • PC (pseudo). The schematic shows excitation, emission, and the moving DNA fragment.

CE fragment detection: fluorescence principles

  • Two methods to promote fluorescence: 1) Stain the DNA with intercalating dyes (e.g., Ethidium bromide or SYBR Safe).

    • Dyes insert between DNA strands and fluoresce under UV; laser detects signal as fragments pass the window.
      2) Attach a fluorescent label to one primer (5′ end) in the PCR primer pair.

    • The PCR product carries the fluorescent tag; as the fragment passes the detector, it emits a colored signal.

  • Multiple colours can be used in a single PCR reaction when using fluorescent primers.

DNA staining vs fluorescent primers details

  • DNA staining:

    • Uses intercalating dyes; glow when bound to DNA under UV light; signal detected by laser as fragments pass.

  • Fluorescent primers:

    • A fluorophore is attached to a primer; incorporated into PCR products; single/ multiple colours allow multiplexing.

STR typing steps – Allelic ladders

  • Allelic ladder: a reference ladder containing all possible alleles for a given STR.

  • Sample peaks are overlaid with the ladder to determine the exact alleles present.

  • Conceptual illustration: ladder shows all possible alleles; sample overlays show detected alleles.

Making an Allelic ladder

  • Create ladders by pooling DNA from individuals with different alleles for the STR; amplify, then combine into a single tube.

  • Allelic ladders exist for all forensic STRs.

Multiplex PCR

  • Advantage: amplify more than one STR at a time in a single tube.

  • Relies on size differences and/or fluorescent tags on primers to distinguish loci.

Multiplex PCR — size differences vs colour differences

  • Size differences: If alleles of different STRs do not overlap in size, they can be tested simultaneously with a single colour. (e.g., Qiaxcel system.)

  • Colour differences: If allele sizes overlap between loci, use different fluorescent colours to distinguish them. Typically 4–5 colours can be detected at once.

Normalised intensity and colour considerations

  • Only 4–5 colours are practical due to spectral overlap between fluorophores.

  • Spectral overlap makes distinguishing very close colours challenging.

  • Diagrammatic note: spectral overlap reduces separation efficiency; normalisation helps but limits the number of distinguishable colours.

Multiplex PCR output and interpretation

  • Output: one or two peaks per STR locus.

  • Each peak is assigned a colour corresponding to its STR marker.

  • Peaks are overlaid on an allelic ladder to determine the alleles present.

Multiplex PCR — practical considerations

  • All primers must work under the same reaction conditions.

  • Increased amounts of dNTPs and Taq polymerase may be required.

  • Fine-tuning is essential for reliable amplification across all targets.

STR kits used in forensics

  • PowerPlex® 16: shows multiple loci with peak ranges around 100–300 bp; includes loci such as D3S1358, TH01, D21S11, D18S51, D5S818, D13S317, D7S820, CSF1PO, VWA, D8S1179, TPOX, FGA, etc.; includes an internal lane standard like ILS-600.

  • Identifiler®: includes loci such as D8S1179, D21S11, D7S820, CSF1PO, FGA, D3S1358, D13S317, D16S539, D2S1338, D19S433, VWA, TPOX, D18S51, D5S818, etc.; uses an internal lane standard (GS500) for sizing.

  • Each kit provides a core set of STRs with size ranges and colour-coded primers to enable multiplexing.

Changing the size of alleles (to avoid overlaps)

  • Two methods to adjust amplicon size so alleles no longer overlap:
    1) Moving primers closer to or farther from the STR to adjust overall amplicon size.
    2) Adding non-nucleotide linkers between primer and fluorescent tag to increase product size without affecting primer performance. Roughly, 1 linker ≈ 2.5 bp.

Moving primers

  • Shifting primers alters amplicon length; still must consider all other primer interactions in the multiplex.

Linkers

  • Small non-nucleotide linkers can be added to increase size by ~2.5 bp per linker, enabling separation of alleles from different STRs that would otherwise overlap.

What makes a good STR marker?

  • Narrow allele size range: limits the number of possible alleles and reduces overlap with other markers.

  • Narrow allele size range also helps to minimize random dropout, especially for larger alleles (e.g., 1 kb vs 200 bp have different amplification efficiencies).

  • Small PCR product size: enhances amplification efficiency; small alleles are more likely to remain intact in degraded DNA samples.

  • Older DNA samples still testable due to smaller amplicons.

  • Larger repeat unit (e.g., 4 bp) is generally better than 2 bp or 3 bp because it reduces slippage during copying; slippage causes stutter peaks that complicate interpretation in mixtures.

CODIS: Combined DNA Index System

  • FBI’s core forensic database with 13 core STR markers routinely tested at the start.

  • They provide a very low probability of random match (a large negative exponent).

  • Loci are spread across different chromosomes to exploit Independent Assortment.

  • 2017 update added 7 new STR markers; Australia uses a core set of 17 loci plus a sex determination test.

Sex determination in STR profiling: Amelogenin

  • Amelogenin marker (AMEL) is used for sex identification; not a true STR marker, it’s a deletion in a non-coding region of the Amelogenin gene.

  • AMEL-X vs AMEL-Y differences:

    • AMEL-X has a 6 bp deletion relative to AMEL-Y.

    • PCR uses the same primers for X and Y; X yields a shorter fragment, Y yields a longer fragment.

  • Interpretations:

    • Female (XX): one peak at 106 bp (106 bp allele on X) – generally a single 106 bp fragment.

    • Male (XY): peaks at 106 bp and 218 bp (106 bp + 112 bp) reflecting both X and Y products.

  • Amelogenin is also useful in detecting mixtures (e.g., rape cases). In males, peak heights for X and Y should be similar; disproportionate peaks may indicate mixtures.

Practical applications and implications of STRs

  • What to do with a profile:

    • You have a DNA profile; a suspect’s DNA may match the evidence.

    • To prove to the court that you have your person, you rely on statistics.

  • Population data forms the basis of statistics:

    • Data collected from >100 individuals in different ethnic groups.

    • For each STR, count the number of times an allele is observed.

    • Allele counts are converted to allele frequencies.

    • When a match is made, multiply allele frequencies to predict how often a particular genotype will be observed (i.e., random match probability).

  • Genotype definition: the combination of alleles inherited from mother and father.

Allele frequencies: concepts and examples

  • Allele frequency at a locus i is the probability that a randomly chosen chromosome carries that allele.

  • Example: D13S317 in Caucasians is shown in a genotype-frequency table with allele pairs and their frequencies.

  • For a given locus, genotypes can be homozygous (AA) or heterozygous (AB).

  • Functions used:

    • If genotype is AA (homozygous), frequency = p^2 where p is the allele frequency for A.

    • If genotype is AB (heterozygous), frequency = 2 p q, where p and q are frequencies of A and B, respectively.

    • If genotype is BB (homozygous for B), frequency = q^2.

Caucasian example (D13S317, TH01, D18S51, D21S11, D3S1358, D5S818, D7S820, D8S1179, CSF1PO, FGA, D16S539, TPOX, VWA)
  • D13S317: alleles 11 and 14; p = 0.33940, q = 0.04801; genotype frequency = 2pq = 0.0326

  • TH01: alleles 6 and 6; p = 0.23179; genotype frequency = p^2 = 0.0537

  • D18S51: alleles 14 and 16; p = 0.13742, q = 0.13907; genotype frequency = 2pq = 0.0382

  • D21S11: alleles 28 and 30; p = 0.15894, q = 0.27815; genotype frequency = 2pq = 0.0884

  • D3S1358: alleles 16 and 17; p = 0.25331, q = 0.21523; genotype frequency = 2pq = 0.1090

  • D5S818: alleles 12 and 13; p = 0.38411, q = 0.14073; genotype frequency = 2pq = 0.1081

  • D7S820: alleles 9 and 9; p = 0.17715; genotype frequency = p^2 = 0.0314

  • D8S1179: alleles 12 and 14; p = 0.18543, q = 0.16556; genotype frequency = 2pq = 0.0614

  • CSF1PO: alleles 10 and 10; p = 0.21689; genotype frequency = p^2 = 0.0470

  • FGA: alleles 21 and 22; p = 0.18543, q = 0.21854; genotype frequency = 2pq = 0.0810

  • D16S539: alleles 9 and 11; p = 0.11258, q = 0.32119; genotype frequency = 2pq = 0.0723

  • TPOX: alleles 8 and 8; p = 0.53477; genotype frequency = p^2 = 0.2860

  • VWA: alleles 17 and 18; p = 0.28146, q = 0.20033; genotype frequency = 2pq = 0.1128

  • Product across all 13 markers (the “random match probability” calculation) = approximately 1.2 imes 10^{-15}, corresponding to a random-match probability of about 1/(1.2 imes 10^{-15}) \,=\, 8.37 imes 10^{14} (i.e., 1 in 8.37 x 10^14) for this Caucasian example.

African American example (D13S317, TH01, D18S51, D21S11, D3S1358, D5S818, D7S820, D8S1179, CSF1PO, FGA, D16S539, TPOX, VWA)
  • D13S317: allele pair 11 and 14; p = 0.30620, q = 0.03488; genotype frequency = 0.0214

  • TH01: 6 and 6; p = 0.12403; genotype frequency = p^2 = 0.0154

  • D18S51: 14 and 16; p = 0.07198, q = 0.15759; genotype frequency = 2pq = 0.0227

  • D21S11: 28 and 30; p = 0.25775, q = 0.17442; genotype frequency = 2pq = 0.0899

  • D3S1358: 16 and 17; p = 0.33527, q = 0.20543; genotype frequency = 2pq = 0.1377

  • D5S818: 12 and 13; p = 0.35271, q = 0.23837; genotype frequency = 2pq = 0.1682

  • D7S820: 9 and 9; p = 0.10853; genotype frequency = p^2 = 0.0118

  • D8S1179: 12 and 14; p = 0.14147, q = 0.30039; genotype frequency = 2pq = 0.0850

  • CSF1PO: 10 and 10; p = 0.25681; genotype frequency = p^2 = 0.0660

  • FGA: 21 and 22; p = 0.11628, q = 0.31783; genotype frequency = 2pq = 0.0739

  • D16S539: 9 and 11; p = 0.19574, q = 0.32119; genotype frequency = 2pq = 0.1257

  • TPOX: 8 and 8; p = 0.37209; genotype frequency = p^2 = 0.1385

  • VWA: 17 and 18; p = 0.24225, q = 0.15504; genotype frequency = 2pq = 0.0751

  • Product across all 13 markers yields a random match probability around 1.66 imes 10^{16} (i.e., about 1 in 1.66 x 10^16) for this African American example.

  • Important point: Allele frequencies differ among ethnic groups; combined probabilities should be calculated using population-specific allele frequencies.

Population-specific allele frequencies

  • Allele frequencies differ across ethnic groups; e.g., Caucasian vs African American frequencies for the same loci are different (example data show marked differences in p, q values).

  • The effect: genotype frequencies and final random-match probabilities change depending on the assumed population.

  • The slide set provides a comparative table showing these differences across loci (e.g., D13S317, TH01, D18S51, etc.).

Multi-locus random match probability (RMP)

  • Concept: Random match probability is the probability that two random individuals in a population would share the exact same multi-locus genotype across all tested STRs.

  • Calculation method:

    • For each locus i with observed genotype (Ai, Bi) and allele frequencies pi = freq(Ai), qi = freq(Bi):

    • If Ai = Bi (homozygous): fi = pi^2

    • If Ai ≠ Bi (heterozygous): fi = 2 pi q_i

    • Then multiply across all loci: RMP = ∏{i} fi

  • Example outcome (Caucasian): RMP ≈ 1.2 imes 10^{-15}; equivalent to 1 in 8.37 imes 10^{14}.

  • Example outcome (African American): RMP ≈ 1 in 1.66 imes 10^{16}.

Practical interpretation and ethical considerations

  • The more STR loci used, the smaller the random match probability, increasing evidentiary strength.

  • Population substructure and ethnicity must be considered; using the wrong population allele frequencies can misestimate the RMP.

  • The Amelogenin sex marker helps identify sex and serves as a check against sample contamination or mixtures, but it is not itself a STR marker.

  • The use of multiple markers and allelic ladders enhances the reliability of genotyping in court.

  • Statistical interpretation in court relies on well-established population genetics principles (Hardy–Weinberg equilibrium, independence of loci, and assumption of random mating).

Hardy–Weinberg and genotype frequencies (recap)

  • For a locus with allele frequencies p and q (p + q = 1):

    • Homozygous AA frequency: p^2

    • Heterozygous AB frequency: 2 p q

    • Homozygous BB frequency: q^2

  • Random match probability for a specific genotype is the product of locus-specific genotype frequencies across all loci tested.

Key takeaways

  • STRs are small, highly variable regions ideal for individual identification due to their high polymorphism and independence across loci.

  • Capillary electrophoresis with fluorescent primers enables rapid, high-throughput STR profiling and precise sizing.

  • Allelic ladders and multiplex PCR are essential tools for robust, efficient forensic STR analysis.

  • Sex determination (Amelogenin) and mixture interpretation are important practical considerations.

  • Population-specific allele frequencies underpin probabilistic interpretation; the strength of a match depends on the number of loci and the diversity of the loci used.

  • The math of random match probability relies on Hardy–Weinberg principles and simple genotype-frequency formulas, multiplied across loci to yield an overall extremely small probability of a coincidental match.

Quick reference formulas

  • Homozygous genotype frequency: f(AA) = p^2

  • Heterozygous genotype frequency: f(AB) = 2 p q \text{where } p = ext{freq}(A), q = ext{freq}(B), p + q = 1

  • Random match probability across n loci: RMP = \prod{i=1}^{n} fi

  • Example: For a locus with alleles 11 and 14, p = 0.33940, q = 0.04801, so f = 2pq = 0.0326