1/96
assigning DNA profiles, probability, frequency, mixture interpretation, next generation sequencing, testimony
Name | Mastery | Learn | Test | Matching | Spaced | Call with Kai |
|---|
No analytics yet
Send a link to your students to track their progress
defined as the probability of discriminating between 2 unrelated individuals
power of discrimination
formula for probability of match (PM)?
sum of (frequency of genotype at a locus)²
power of discrimination formula?
1- PM
steps for determining the probability of a single source DNA profile?
determine alleles at each locus, find allele frequency from relevant populations, calculate expected genotype frequency, report multilocus results
when calculating expected genotype frequency using Hardy Weinburg, what are the potential subpopulation corrections?
correction for profile probability or correction for match probability
aggregates and harmonizes data across mnay large scale sequencing projects to create summary allele frequency statistics
gnomAD database
_______________ is distinct from either race or ethnicity
genetic ancestry
reflects an individual’s demographic history and refers to the specific lines of decent through a family tree by which an individual inherited DNA from specific ancestors
genetic ancestry
sociopolitical constructs used to group individuals based on perceived shared ancestry, biological characteristics , or on perceived shared cultural heritage
race and ethnicity
genetic ancestry is a ____________________
continuous measure
homozygote loci formula without population substructure correction
p²
heterozygote loci formula without population substructure correction
p²j²
homozygote loci formula with inbreeding population
p²+pF(1-p)
heterozygote loci formula with inbreeding population
2pq(1-F)
subpopulation theory is similar to inbreeding but F (which is really ______) becomes __________
Fis, Fst
FST refers to
probability that 2 alleles randomly drawn from the population are identical by decent
5 main points of the 196 NRC Report on Forensic DNA Evidence
validated DNA evidence, new formulas to calculate the likelihood of a match for better understanding for jurors, protecting suspects from false incrimination, recommending the use of a DNA profile database specific to the racial background of the sample, assuring DNA profiling is reliable
conservative value to be used for θ (in the US) in this formula p² +p(1-p)θ, when the exact genotype can be determined
0.01
recommendation 4.1 of the NRC report stated
profile frequency of heterozygotes need to use H-W without theta correction
heterozygote formula to be used per the 4.1 recommendation?
2pipj
why did recommendation 4.1 change the formula for heterozygotes?
formula with theta was overestimating the frequency of a genotype
homozygote formula with a subpopulation?
pi² + pi(1-pi)thetaii
heterozygote formula with a subpopulation?
2pipj(1-thetaij)
what did recommendation 4.2 of the NRC report say?
use allele frequencies from the subgroup the sample came from. if unknown subgroup, use the formula
signs that sample is a mixture
loci with more than 2 alleles, severe peak imbalance, abnormally high stutter
expected severe peak balance of _______% in a mixture sample
60-70
expected high stutter of _______% in mixture samples
15-20
minimum height requirement at and above which detected peaks can be reliably distinguished from background noise
analytical threshold
the analytical threshold or AT is typically around _______ RFUs
25-50
peak height value below which is reasonable to assume that, at a given locus, allelic dropout of a sister allele in a heterozygous pair may have occurred
stochastic threshold
stochastic threshold or ST is typically around _______ RFUs
200
steps for interpreting a mixture
identify presence of a mixture, designate allele peaks, identify number of contributors, estimate relative ratio of individuals contributing to the mixture, consider all possible genotypes, compare reference samples, statistical interpretation
formula for determining minimum number of alleles
Nalleles/2 then rounded up
relative ratio considers
the peak heights of the whole profile
all possible genotypes for 4 peaks (A, B, C, D)
A, B + C, D
A, C + B, D
A, D + B, C
all possible genotypes for 3 peaks (A, B, C)
A, A + B, C
B, B + A, C
C, C + A, B
A, B + A, C
B, C + A, C
A, B + B, C
all possible genotypes for 2 peaks (A, B)
A, A + A, B
A, B + A, B
A, A + B, B
A, B + B, B
accounts for if a single peak below the stochastic threshold results from the homozygous genotype or the heterozygous genotype
2p rule
one or more of the mixture components could comprise low template DNA, as such we need to take into account
allele drop out and drop in
2p rule is used to calculate if
an actual allele dropped out or if the sample is a homozygote
2p rule formula
2pa-pa² < 2pa
probability that the DNA of a randomly chosen person has the same DNA profile as the DNA of the casework sample
RMP or random match probability
sum of the probabilities for all of the genotypes that represent the possible contributors to a DNA mixture under the assumption of a defined number of contributors
RMP calculation or modified RMP
how is modified RMP different from the combined probability of Inclusion (CPI)?
doesn’t use assumptions to determine number of contributors
estimate of the probability that a randomly selected, unrelated individual would be included as a possible contributor to a mixture
combined probability of inclusion or CPI
probability that a randomly selected, unrelated individual would be excluded as a contributor to the mixture
combined probability of exclusion or CPE
if it is determined that there is allele dropout at a given locus, the locus ______________
will be excluded from the match probability
steps for calculating the likelihood ratio for a 2-person mixture
condition the number of contributors, state the alternative hypothesis, evaluate the probability of the evidence under the defense proposition, evaluate the probability of the casework sample under the prosecution proposition, calculate the likelihood ratio, report the likelihood ratio
refers to the use of biological modeling, statistical theory computer algorithms, and probability distributions to calculate likelihood ratios and/or infer genotypes for the DNA typing results of forensic samples
probabilistic genotyping
why do we use probabilistic genotyping?
statistically interprets mixture samples
PG continuous models consider _______ as a continuous variable
peak heights
probabilistic genotyping genotyping considers _________ in order to deconvolute a DNA profile into a list of genotype sets
observable data, models, calibration data, and unknowable
specific for a set of laboratory hardware and DNA typing kit
calibration data
refers to the specifics of the actual DNA profile being analyzed
unknowables
the unknowables of PG continuous models include
number of contributors, DNA amounts of each contributor, degradation of each contributor, amplification efficiency of each locus, replicate amplification strength, level of peak height variability within the sample
“mass parameters” or the total allelic product within PG continuous models includes
DNA amounts of each contributor, degradation of each contributor, amplification efficiency of each locus, replicate amplification strength
assumes degradation is exponential but that each contributor to have different curves
total allelic product modeling
total allelic product modeling tests different mass parameters to form a ____________
probability density
iterative re-sampling process-in each iteration, genotype combinations and biological parameters (mass parameters) are proposed to describe the profile
Markov Chain Monte Carlo
how does the Markov Chain Monte Carlo deconvolution work?
genotype and set of values is proposed for every iteration and compared to observed results to see how well they explain the data
preliminary MCMC run to ensure the post burn-in MCMC begins in an area of high probability space
burn-in
parameters for MCMC burn-in?
8 independent chains must reach 100,000 accepted iterations
occurs after burn-in and uses the same number of chain to acheive ~50,000 accepted iterations
post burn-in
occurs at completion of MCMC and normalizes the number of genotype sets accepted during post-burn in
weight
an MCMC weight of 0 means
observed data cannot be explained by the proposed genotype set
an MCMC weight of 1 means
only genotype set that explains the DNA profile
the progression fo the MCMC is influenced by a “seed” set by a __________
random number generator
process of using calculating the probability density of each peak in the profile, comparing it with the proposed model, measuring it’s “fit” , and accepting or rejecting the proposed values
Metropolis-Hastings
the Metropolis-Hastings Algorithm operates
within the Markov Chain Monte Carlo framework
when working with the Metropolic-Hastings algorithm, the ________ the probability density the better fit of the parameter values to the observed profile
higher
within the Metropolis-Hastings algorithm, the proposed values for the genotypes and mass parameters are either accepted or rejected depending on ________
probability density
after deconvolution, a likelihood ratio can be assigned to any POI based on ____________
propositions considered
parameters requiring optimization for probabilistic genotyping
analytical threshold, stutter ratios, saturation limit, drop-in parameters, allele/stutter peak height variance, LSAE variance, relevant population parameters
year QIAGEN developed the first DNA purification method in forensics
1998
year QIAGEN launched its first STR kit
2010
QIAGEN workflow steps
collection, pre-treatment, sample preparation, array setup, quantification, STR/NGS analysis
traditional DNA analysis workflow
sample collection, extraction and quantification, PCR, CE & data analysis
why use next generation sequencing over CE?
add more loci targets, not limited by ampicon bp size, can use STRs and SNPs, visible trait estimation
ForenSeq Human Identification workflow?
sample collection, extraction & quantification, library preparation, sequencing & data analysis
why sequence STRs?
smaller amplicons, looks at the whole sequence not length, can target STRs and SNPs
the ForenSeq Signature Plus is the only QIAGEN machine that has
STR analysis, kinship, and externally visible characteristics
SNPs are used over STRs bc
need way more for a match
the MainstAY and MainstAY SE kits can identify relatives of the ________ degree
first
the SIgnature Plus kit can identify relatives to the ________ degree
first or second
Kintelligence can identify relatives to the _________ degree
fourth or fifth
how are libraries prepared?
amplify and tag targets, attach indexes and adapters, purify, dilute sample to make loci all the same concentration
what is the purpose of indexes in QIAGEN NGS?
provide a unique marker specific to that allele and sample
how does the sequencing part of the QIAGEN NGS work?
samples get pulled onto the flowcell, make a U shape on the cell to be read, one nucleotide is added and read during each cycle
a really special feature about sequencing is that it able to
easily determine number of contributors
steps of PCR
extraction, quantification, amplification, analysis
STRmix is used to
help declutter mixture samples
forensic scientists/biologists can only speak to the _______ level of testimony
source or sub-source
occurs when the conclusion is restated in a manner that bolsters the hypothesis of the prosecutor, typically by transposing the conditional and making the evidence seem more exclusive
prosecutor’s fallacy
error in logic on the part of the defense counsel that bolster’s the defense’s hypothesis and favors the defendant, typically by relating the probability to a specific population to make the profile seem more inclusive
defendant’s fallacy
fallacy in which the statistic is bolstered by relating it directly to the profile being compared in relation to the general population
uniqueness fallacy
occurs when the probability statement is taken from one level within the hierarchy of propositions to a higher level
association fallacy