Forensics and CODIS Notes
Forensics and CODIS
Dr. Jennifer M Johnston, BIOL 370 General Genetics
Learning Objectives
- Describe the CODIS database.
- Diagram how DNA profiles are obtained.
- Determine the frequency of a genotype match for CODIS Core Loci.
- Explain how the number of pairwise comparisons affects the frequency of a genotype match.
- Perform calculations for match probability.
- Consider the strength of Familial Screening
CODIS (Combined DNA Index System)
- Maintained by the FBI.
- Consists of:
- Forensic Index: DNA profiles collected at crime scenes (1,391,726 profiles).
- Offender and Arrestee Indices: DNA profiles of individuals arrested or charged with any felony.
- 18,135,382 offender profiles.
- 5,774,055 arrestee profiles.
- A "hit" occurs when there's a match between a profile in the offender/arrestee indices and the forensic index.
- Over 741,351 hits have assisted in 719,736 investigations.
CODIS Statistics (as of January 2025)
- Examples of statistics from various states (Alabama, Kentucky, Alaska, Louisiana) including:
- Number of offender profiles.
- Number of arrestee profiles.
- Number of forensic profiles.
- Number of NDIS (National DNA Index System) participating labs.
- Investigations aided.
DNA Fingerprint Unsolved Crime and Innocence Protection Act
- Passed in November 2004 (62% Yes, 38% No).
- Requires DNA sample collection from all felons and adults/juveniles arrested for specified crimes.
- Submission to state DNA database and, by 2009, from adults arrested for any felony.
- Authorizes local law enforcement labs to perform analyses for state databases and maintain local databases.
California's Forensic Database in CODIS
- California has a significant portion of profiles in CODIS.
- 14% of all Offender + Arrestee profiles are from CA: .
- California represents 11.4% of the US population: .
- 24 CODIS labs in CA including Berkeley DNA Lab and Orange County Crime Lab, along with many private companies.
Arrestee DNA Collection
- Ethical considerations about collecting DNA from individuals upon arrest.
Arrestee Hit Statistics in California
- 3,778 arrestee hits for murder, rape, or robbery in CA.
- 92% of these 3,778 were originally arrested for an offense not murder, rape, or robbery.
Arrestee Qualifying Offenses for Hits to Murder, Rape, and Robbery
Based on a sample of 100 cases (as of December 4, 2012):
- Violent Crimes: 40%
- Property Crimes: 17%
- Drugs: 25%
- DUI: 4%
- Fraud: 6%
- Other: 8%
DNA Profile
- DNA from an individual is tested at amelogenin plus 20 CODIS core loci.
Amelogenin
- Amelogenin determines male vs. female.
- Encodes a protein found in tooth enamel.
- AMELX on the X chromosome and AMELY on the Y chromosome.
- The length of intron 1 varies between AMELX and AMELY.
- PCR amplification yields a 106 bp fragment for AMELX and a 112 bp fragment for AMELY.
CODIS Loci
- Short tandem repeats (STRs) that are 2-6 bp long.
- The number of repeats varies from 3 to 50.
- Identified by PCR with primers designed to anneal outside of the repeat region (e.g., ).
Multiplexing
- Multiple primer pairs in one PCR reaction.
- Allows obtaining STR data from multiple loci simultaneously.
- PCR product sizes differ for different loci.
- Differently-labeled fluorescent primers are used for different CODIS core loci.
- Example: STR unit is 4 bp long for D3S1358 and TH01 loci.
DNA Profile Example
- Example using Amelogenin and 6 CODIS core loci.
- Fluorescent multiplex STR result using AmpFISTR™ COfiler™ kit and ABI 310 Genetic Analyzer.
- Allele calls for the seven loci in this multiplex are identified.
PCR Product Length
- Primer positions relative to the STR (5' end of green primer 72bp before, 3' end of blue primer 63bp after).
- Primers are 20 nucleotides long and the STR has a 4 bp repeat.
- Given a genotype of 12, 13, calculate the length of the two PCR products.
CODIS Loci and Polymorphism
- 20 CODIS Core Loci have a high degree of polymorphism in the human population.
- Have 5 to 20 different alleles in the population.
- Different alleles have different numbers of repeats (STRs).
- All 20 CODIS Core Loci are autosomal, unlinked, and do not contribute to fitness.
- Allele frequencies can be established for each CODIS Core Loci from offender and arrestee indices.
Allele Frequencies
- Example of allele frequencies for one CODIS Core Locus, TH01 (chromosome 11).
- Some alleles are more common than others (e.g., 6, 7, 8, 9, 9.3 are 95.9% of alleles).
Probability of a Match
- Calculating the probability of a match at the TH01 locus.
- If the genotype of the forensic profile is (7,9), the probability is calculated as .
- Calculating the probability of a genotype match at the 13 original CODIS Core Loci: .
Population-Specific Allele Frequencies
- Different allele frequencies are observed for Caucasian, African-American, and Hispanic Americans due to intra-marriage.
- Race is a socio-cultural construct.
Forensic Index Problem
- Example problem involving three loci with varying numbers of alleles and frequencies.
- Calculating the frequency of a specific genotype combination: 1a1c, 2e2e, 3a3c.
CODIS Requirements
- A minimum of 8 CODIS Core Loci are required to be included in the database.
General Estimation for Allele Frequencies
- Estimation for 5 common alleles each at equal frequency .
- Each homozygous genotype has probability: .
- Each heterozygous genotype has probability: .
- Probability of finding a particular homozygous genotype: .
- Probability of finding a particular heterozygous genotype: , where = number of CODIS Core Loci.
Probability Calculation Example
- For 8 CODIS Core Loci: , or 1 in 596 million, which is greater than the US population (333 million).
Pairwise Combinations
- In 2001, a lab worker in AZ found two profiles matching at 9 of the 13 CODIS Core Loci out of 65,000 profiles.
- They were not related.
- Expected probability: , or 1 in 7.45 billion, but observed 2/65,000.
Number of Pairwise Comparisons
- Approximately 3.35 x offender plus arrestee profiles in the CA database.
- One forensic profile is compared to all offender and arrestee profiles.
- That's 3.35E6 pairwise comparisons.
Probability of a Match at 8 CODIS Core Loci
- Estimated probability of finding a match at 8 CODIS Core Loci (heterozygous genotypes): .
Match Probability and Pairwise Comparisons
- # pairs considered x probability of finding a match: .
- Approximately 1 in 178 chance of finding a random match.
- Forensic evidence needs to be high quality, so more than 8 STR loci are resolved.
- Circumstantial evidence is important.
Match Probability Calculation
- What is the probability of finding a match with a forensic profile with 9 CODIS Core Loci in CA?
- Three loci have four common alleles, three have five, and three have six.
- Assume common alleles found at similar frequencies.
- Of the loci with four common alleles, two are homozygous, and the remaining seven loci are heterozygous.
Match Probability in Nevada
- For the same forensic profile, what would be the probability of a match in NV?
- NV has offender plus arrestee profiles.
Difference in Match Probability
- The probability is different in CA vs. NV because there are fewer pairwise comparisons in NV, meaning less chance of finding a match.
Familial Screening
- Parent-child relationship is most reliable for Familial Screening.
- Example: Christopher Franklin's DNA profile matched at one allele for all CODIS core loci of the Grim Sleeper victims.
- Christopher’s Father matched both alleles, strengthening the link.
Familial Screening Probability
- Consider one CODIS locus and assume that both parents are heterozygous for different alleles.
- One allele matches the forensic profile.
- What is the probability that father and son will match for one alleles in the gene pair?
- What is the probability that two brothers will match for one of the alleles of the gene pair?
Golden State Killer - Genealogy Database Search Example
- Arrested on April 24, 2018, charged with 51 rapes and 12 murders in CA between 1974-86.
- No hits using CODIS.
- Forensic sample tested for SNP loci (microarray) and searched for a partial match to the open-source GEDmatch website.
- A partial match from a third cousin and circumstantial evidence led investigators to DeAngelo.
- A DNA sample obtained from DeAngelo (pizza crust) was a perfect match to forensic samples.
- DeAngelo pleaded guilty to murder and kidnapping on June 29, 2020 and was sentenced to life in prison without parole on August 21, 2020.
- GEDMatch is now owned by Verogen. Other cold cases solved using this technique.
Partial Match in GEDmatch
- Found a partial match in GEDmatch website with 0.78% shared DNA.
- For 20 CODIS Loci (40 alleles if all heterozygous), 1 match = or 2.5%.
- For 1,000,000 SNP loci (2,000,000 alleles), how many matches for 0.78% shared DNA?
- matches.
- This is close to being unrelated so look for linked loci, haplotypes.
- Large segments of chromosome DNA passed through generation, Identical by Descent.