Forensics and CODIS Notes

Forensics and CODIS

Dr. Jennifer M Johnston, BIOL 370 General Genetics

Learning Objectives

  • Describe the CODIS database.
  • Diagram how DNA profiles are obtained.
  • Determine the frequency of a genotype match for CODIS Core Loci.
  • Explain how the number of pairwise comparisons affects the frequency of a genotype match.
  • Perform calculations for match probability.
  • Consider the strength of Familial Screening

CODIS (Combined DNA Index System)

  • Maintained by the FBI.
  • Consists of:
    • Forensic Index: DNA profiles collected at crime scenes (1,391,726 profiles).
    • Offender and Arrestee Indices: DNA profiles of individuals arrested or charged with any felony.
      • 18,135,382 offender profiles.
      • 5,774,055 arrestee profiles.
  • A "hit" occurs when there's a match between a profile in the offender/arrestee indices and the forensic index.
  • Over 741,351 hits have assisted in 719,736 investigations.

CODIS Statistics (as of January 2025)

  • Examples of statistics from various states (Alabama, Kentucky, Alaska, Louisiana) including:
    • Number of offender profiles.
    • Number of arrestee profiles.
    • Number of forensic profiles.
    • Number of NDIS (National DNA Index System) participating labs.
    • Investigations aided.

DNA Fingerprint Unsolved Crime and Innocence Protection Act

  • Passed in November 2004 (62% Yes, 38% No).
  • Requires DNA sample collection from all felons and adults/juveniles arrested for specified crimes.
  • Submission to state DNA database and, by 2009, from adults arrested for any felony.
  • Authorizes local law enforcement labs to perform analyses for state databases and maintain local databases.

California's Forensic Database in CODIS

  • California has a significant portion of profiles in CODIS.
    • 14% of all Offender + Arrestee profiles are from CA: ((2,297,261+1,052,985)/(5,774,055+18,135,382))((2,297,261 + 1,052,985) / (5,774,055 + 18,135,382)).
  • California represents 11.4% of the US population: 39,128,162/341,678,51939,128,162 / 341,678,519.
  • 24 CODIS labs in CA including Berkeley DNA Lab and Orange County Crime Lab, along with many private companies.

Arrestee DNA Collection

  • Ethical considerations about collecting DNA from individuals upon arrest.

Arrestee Hit Statistics in California

  • 3,778 arrestee hits for murder, rape, or robbery in CA.
  • 92% of these 3,778 were originally arrested for an offense not murder, rape, or robbery.

Arrestee Qualifying Offenses for Hits to Murder, Rape, and Robbery

  • Based on a sample of 100 cases (as of December 4, 2012):

    • Violent Crimes: 40%
    • Property Crimes: 17%
    • Drugs: 25%
    • DUI: 4%
    • Fraud: 6%
    • Other: 8%

DNA Profile

  • DNA from an individual is tested at amelogenin plus 20 CODIS core loci.

Amelogenin

  • Amelogenin determines male vs. female.
  • Encodes a protein found in tooth enamel.
  • AMELX on the X chromosome and AMELY on the Y chromosome.
  • The length of intron 1 varies between AMELX and AMELY.
  • PCR amplification yields a 106 bp fragment for AMELX and a 112 bp fragment for AMELY.

CODIS Loci

  • Short tandem repeats (STRs) that are 2-6 bp long.
  • The number of repeats varies from 3 to 50.
  • Identified by PCR with primers designed to anneal outside of the repeat region (e.g., (ACCT)<em>12(ACCT)</em>15(ACCT)<em>{12} (ACCT)</em>{15}).

Multiplexing

  • Multiple primer pairs in one PCR reaction.
  • Allows obtaining STR data from multiple loci simultaneously.
  • PCR product sizes differ for different loci.
  • Differently-labeled fluorescent primers are used for different CODIS core loci.
  • Example: STR unit is 4 bp long for D3S1358 and TH01 loci.

DNA Profile Example

  • Example using Amelogenin and 6 CODIS core loci.
  • Fluorescent multiplex STR result using AmpFISTR™ COfiler™ kit and ABI 310 Genetic Analyzer.
  • Allele calls for the seven loci in this multiplex are identified.

PCR Product Length

  • Primer positions relative to the STR (5' end of green primer 72bp before, 3' end of blue primer 63bp after).
  • Primers are 20 nucleotides long and the STR has a 4 bp repeat.
  • Given a genotype of 12, 13, calculate the length of the two PCR products.

CODIS Loci and Polymorphism

  • 20 CODIS Core Loci have a high degree of polymorphism in the human population.
  • Have 5 to 20 different alleles in the population.
  • Different alleles have different numbers of repeats (STRs).
  • All 20 CODIS Core Loci are autosomal, unlinked, and do not contribute to fitness.
  • Allele frequencies can be established for each CODIS Core Loci from offender and arrestee indices.

Allele Frequencies

  • Example of allele frequencies for one CODIS Core Locus, TH01 (chromosome 11).
  • Some alleles are more common than others (e.g., 6, 7, 8, 9, 9.3 are 95.9% of alleles).

Probability of a Match

  • Calculating the probability of a match at the TH01 locus.
  • If the genotype of the forensic profile is (7,9), the probability is calculated as 2(0.16)(0.199)0.0642(0.16)(0.199) \approx 0.064.
  • Calculating the probability of a genotype match at the 13 original CODIS Core Loci: (0.064)13(0.064)^{13}.

Population-Specific Allele Frequencies

  • Different allele frequencies are observed for Caucasian, African-American, and Hispanic Americans due to intra-marriage.
  • Race is a socio-cultural construct.

Forensic Index Problem

  • Example problem involving three loci with varying numbers of alleles and frequencies.
  • Calculating the frequency of a specific genotype combination: 1a1c, 2e2e, 3a3c.

CODIS Requirements

  • A minimum of 8 CODIS Core Loci are required to be included in the database.

General Estimation for Allele Frequencies

  • Estimation for 5 common alleles each at equal frequency (1/5=0.2)(1/5 = 0.2).
  • Each homozygous genotype has 4%4\% probability: 0.2×0.2=0.040.2 \times 0.2 = 0.04.
  • Each heterozygous genotype has 8%8\% probability: 2(0.2)(0.2)=0.082(0.2)(0.2) = 0.08.
  • Probability of finding a particular homozygous genotype: (0.04)n(0.04)^n.
  • Probability of finding a particular heterozygous genotype: (0.08)n(0.08)^n, where nn = number of CODIS Core Loci.

Probability Calculation Example

  • For 8 CODIS Core Loci: (0.08)8=1.68E9(0.08)^8 = 1.68E-9, or 1 in 596 million, which is greater than the US population (333 million).

Pairwise Combinations

  • In 2001, a lab worker in AZ found two profiles matching at 9 of the 13 CODIS Core Loci out of 65,000 profiles.
  • They were not related.
  • Expected probability: (0.08)9=1.34E10(0.08)^9 = 1.34E-10, or 1 in 7.45 billion, but observed 2/65,000.

Number of Pairwise Comparisons

  • Approximately 3.35 x 10610^6 offender plus arrestee profiles in the CA database.
  • One forensic profile is compared to all offender and arrestee profiles.
  • That's 3.35E6 pairwise comparisons.

Probability of a Match at 8 CODIS Core Loci

  • Estimated probability of finding a match at 8 CODIS Core Loci (heterozygous genotypes): (0.08)8=1.68E9(0.08)^8 = 1.68E-9.

Match Probability and Pairwise Comparisons

  • # pairs considered x probability of finding a match: 3.35E6×1.68E9=5.60E33.35E6 \times 1.68E-9 = 5.60E-3.
  • Approximately 1 in 178 chance of finding a random match.
  • Forensic evidence needs to be high quality, so more than 8 STR loci are resolved.
  • Circumstantial evidence is important.

Match Probability Calculation

  • What is the probability of finding a match with a forensic profile with 9 CODIS Core Loci in CA?
  • Three loci have four common alleles, three have five, and three have six.
  • Assume common alleles found at similar frequencies.
  • Of the loci with four common alleles, two are homozygous, and the remaining seven loci are heterozygous.

Match Probability in Nevada

  • For the same forensic profile, what would be the probability of a match in NV?
  • NV has 2.5E5\sim 2.5E5 offender plus arrestee profiles.

Difference in Match Probability

  • The probability is different in CA vs. NV because there are fewer pairwise comparisons in NV, meaning less chance of finding a match.

Familial Screening

  • Parent-child relationship is most reliable for Familial Screening.
  • Example: Christopher Franklin's DNA profile matched at one allele for all CODIS core loci of the Grim Sleeper victims.
  • Christopher’s Father matched both alleles, strengthening the link.

Familial Screening Probability

  • Consider one CODIS locus and assume that both parents are heterozygous for different alleles.
  • One allele matches the forensic profile.
  • What is the probability that father and son will match for one alleles in the gene pair?
  • What is the probability that two brothers will match for one of the alleles of the gene pair?

Golden State Killer - Genealogy Database Search Example

  • Arrested on April 24, 2018, charged with 51 rapes and 12 murders in CA between 1974-86.
  • No hits using CODIS.
  • Forensic sample tested for SNP loci (microarray) and searched for a partial match to the open-source GEDmatch website.
  • A partial match from a third cousin and circumstantial evidence led investigators to DeAngelo.
  • A DNA sample obtained from DeAngelo (pizza crust) was a perfect match to forensic samples.
  • DeAngelo pleaded guilty to murder and kidnapping on June 29, 2020 and was sentenced to life in prison without parole on August 21, 2020.
  • GEDMatch is now owned by Verogen. Other cold cases solved using this technique.

Partial Match in GEDmatch

  • Found a partial match in GEDmatch website with 0.78% shared DNA.
  • For 20 CODIS Loci (40 alleles if all heterozygous), 1 match = 1/40=0.0251 / 40 = 0.025 or 2.5%.
  • For 1,000,000 SNP loci (2,000,000 alleles), how many matches for 0.78% shared DNA?
  • 2,000,000×0.0078=15,6002,000,000 \times 0.0078 = 15,600 matches.
  • This is close to being unrelated so look for linked loci, haplotypes.
  • Large segments of chromosome DNA passed through generation, Identical by Descent.