Forensic Bio Unit 3
Fluorescence Labeling
- A fluorescent dyes contain a fluorophore which is the component of the dye that actually fluoresces
- The fluorescence is caused when a certain wavelength of light hits the fluorophore and excites it
- The fluorophore will excite at one wavelength and emit at a separate wavelength
- The difference in the excitation wavelength and the emission wavelength of a dye is called the STOKES SHIFT
Dyes Used to Fluorescently Label PCR
- Different dyes can be used in the same analysis since optical filters can be used to separate out different dye colors
- The number of dues used depends on need (or kit used)
- Generally, the more dyes, the more loci (locations in the DNA that can be analyzed)
- Dyes used generally fall within the 400-650nm range
- Some kits used in forensics use up to 6 dyes
Singleplex vs Multiplex
- In PCR it is possible to use more than one primer set
- As long as the regions of interest are not overlapping, you can amplify more than one area
- The components will be the same, you just need to have a primer set for each region of interest
Multiplex Kits
- Current multiplex kits in forensics can amplify over 20 loci at the same time
- From this single PCR reaction, enough information can be generated to differentiate between every individual in the world (except for identical twins)
- This saves a large amount of time instead of setting up and running over 20 separate reactions for each sample
Summary
- Electrophoresis separates DNA based on size only
- Similarly sized PCR products (amplicons), even if from entirely different locations, cannot be separated by electrophoresis without labeling
- Fluorescently labeled products allow for DNA fragments of the same size to be distinguished from one another
Two Different Types of VNTR Assays
- RFLP – based VNTRs
- PCR-based VNTRs
o Both types use markers that are minisatellites (8-200bp repeats)
RFLP-based VNTRs
- This type of analysis was not based on PCR – requires a much larger amount of starting material for an analysis
- Multiple VNTR loci can be used
o Cannot be multiplexed (combined into one reaction)
- This method is based on restriction fragment length polymorphisms
Variable Number Tandem Repeats
- In this case, the repeat unit is 31 base pairs long
- There are 20 repeats total
- This would be an example of one allele
o Each person has two alleles (one from each parent)
- The number of repeats can vary widely (typically 10-30)
- Differences are based on length (ex. 30 repeats is longer than 28 repeats)
Restriction Fragment Length Polymorphism
- RFLP- based VNTR profiling utilizes restriction fragment length polymorphisms (RFLPs)
- RFLPs are based on restriction endonucleases, which are enzymes that cut DNA at a certain place
- The enzyme has what is known as a recognition sequence – they recognize a certain nucleotide sequences and cut at that sequence
Detecting Size Differences for VNTRs
- Since we are detecting differences in length, you can perform electrophoresis
- Once you have the difference size fragments, you can compare your crime scene sample to any possible suspects
Disadvantages of VNTRs with RFLP
- Does not perform well with degraded samples
- Does not perform well with samples in limited quantity
- Since both these characteristics are common among forensic samples, a different type of method is needed
PCR-Based VNTRs
- VNTRs using PCR to amplify the regions is called amplified fragment length polymorphism (AFLP)
- If the overall size of the locus is under 1000bo, then AFLPs can be used
- Similar to the RFLP method, AFLPs look at differences in the number of repeats by amplifying with primers in the conserved flanking regions
- Primers are designed to anneal in the conserved region, the number of repeats is what varies
PCR-Based VNTRs
- Primers are designed so the length of the flanking regions are always the same
- This is showing an example of a heterozygous individuals
Advantages of the PCR-based VNTRs
- The use of PCR makes it so less initial DNA is required
- By multiplexing multiple loci, it is possible to get a higher discriminatory power (a greater level of individualization)
- Many of the VNTR loci are in upwards of 1000bp in length, which is less than ideal for forensic samples
o The need to reduce this size leads to our next topic, STRs, which is how forensic testing is currently performed.
Minisatellites vs Microsatellites
- Minisatellites and microsatellites are both examples of tandem repeats (Adjacent regions of repeated units)
- The main difference between the two types is the length of the repeat
o Minisatellites are typically 8-200bp repeat units
o Microsatellites are typically 2-7bp repeat units
Microsatellites
- Smaller repeat number means smaller overall allel sizes
- Smaller allele sizes are better for forensic applications
- Why?
o Better for degraded DNA
o Easier to multiplex (increases discriminatory power)
Characteristics of STRs
- The microsatellites used in forensic analysis are short tandem repeats (STRs)
- These are based on nuclear DNA, so there are two copies for each locus (location)
- Alleles can be homozygous or heterozygous
- There are a large amount of STRs in the human genome (estimated to be over 100,000)
- Each STR is characterized by the core repeat and the flanking region
- The core repeat consists of the tandemly repeated regions
- The flanking region consists of a conserved area on each side (this is where the primers anneal)
Repeat Unit Length
- The different lengths of repeats for STRs range from dimeric (2bp), trimeric (3bp), tetrameric (4bp), pentameric (5bp), hexameric (6bp), and heptameric (7bp)
- Dimeric and trimeric repeats have issues with stutter
- Pentameric, hexameric, and heptameric are less abundant
- Tetrameric is the length used in all core loci used in forensics (USA)
o A few pentameric STR loci are utilized internationally
Repeat Unit Length
- The tetrameric unit length is commonly found in the human genome
- The tetrameric length is highly polymorphic – a crucial trait for STR analysis
o Polymorphic means that an allele appears in multiple forms, or that there is a lot of variety for that locus
Repeat Unit Seqences
- Not each type of repeat is the same
- The differences in types is based on the sequence of the repeats
- The core STR loci include each of these types
o Simple
o Compound
o Complex
Simple STR Repeats
- Simple repeats consist of tandem repeats with identical repeating units
- EX: repeat at D5 is (AGAT)n, where n is the number of repeats
- This allele number would be 10; (AGAT)10
Compound STR Repeats
- Compound repeats consist of more than one type of repeat
- EX: D8 – TCTAN [TCTG] NTCTAN
- This allele number is 14; [TCTA] 2 [TCTG][TCTA]11
Complex STR Repeats
- Complex repeats consist of several clusters of different tandem repeats with intervening sequences
How is the STR Repeat Named?
- The tetranucleotide repeat motif is named based on the top strand
o There are historical exceptions to this naming convention
- This would be referred to as a TCAT repeat
Characteristics of the Core Loci
- In the US, there are currently 20 loci (and a sex marker) that are part of the core CODIS loci
- The increase to 20 loci recently happened in 2017, as dating
o 1998 until 2017 only 13 loci were included in the original CODIS loci
- CODIS stands for Combined DNA Index System
Why Would you Expand the Core Loci Set?
- The move to increase the number of core CODIS loci was made primarily to address the following items:
o Decrease chance of adventitious match (a match due to random chance)
o Be more like international databases
o Increase discriminatory power
How are the Core CODIS loci named?
- The core CODIS loci are named based on either 1) the chromosome they are found or 2) the name of the gene they are a part of
- EX: D5S818 is located on chromosome 5
- Other loci (ex. FGA, CSF1PO) are genes, but refer specifically to the introns of those genes (regions not coding for the protein)
What Makes a Good Core CODIS Marker?
- The goal of these markers is to lead to the individualization of a sample
- What are some characteristics they could possess?
o Polymorphic
o Consistency in flanking regions among all individuals (primer locations)
o Smaller sizes are preferred (Degraded DNA)
o Not linked to other loci (generally on separate chromosomes; if on the same chromosome, far enough away not to be linked)
o Few amplification artifacts (ex: stutter)
Forensic STR Analysis
- The current STRs used in forensic investigation are amplified using fluorescently labeled primers
- The different amplicons are separated by capillary electrophoresis
- The different color dyes can be separated by the optical filters
Internal Lane Standard
- The internal lane standard (ILS) is added to each sample
- The ILS consists of DNA fragments of known size, making it possible to size your fragments of unknwon size
- This is the same function as the PCR marker for slab gel electrophoresis
An Electropherogram
- The data collection process generates an electropherogram
- An electropherogram is a display of the peaks representative of the different fragments
- An electropherogram is seprated into color channels and shows the relative amounts of each fragment by displaying an RFU (relative fluorescent unit) value
- The allele number is the number of repeats, the overall size is how many base pairs the fragment is long
Determining the Allele
- The internal lane standard is used to determine the size of the fragments represented by each peak
- How can we determine what the allele is? An allelic ladder
- An allelic ladder is a sample that is made up of the majority of known alleles at a certain locus
- PCR markers used in slab gel electrophoresis are sometimes called “ladders”
Allelic Ladder
- In this figure, each locus is separated
- The peaks represent the common alleles for that particular locus
Determining Genotype
- The genotype of an individual is the number of repeats for both alleles at that locus
STR Profile
- A STR profile consists of the genotype for all the loci used in a certain kit
- For example, a STR profile used for uploading to the CODIS database will include the genotype for all 20 core CODIS loci
Example of STR Profile
- This electropherogram shows an STR profile using the Identifier Kit
- This kit amplifies 15 loci plus a sex marker
- Prior to 2017, this kit could amplify the core CODIS loci
- Since all 13 core loci are represented, this was a complete STR profile
Interpretation of STR Profiling Results
- There are three common conclusions at the end of STR analysis
o Inclusion: the genotype of two compared STR profiles are identical
o Exclusion: the two genotypes differ, and that the profiles originated from different sources
o Inconclusive: indicates that there is not enough info to support a conclusion of either inclusion or exclusion (common for partial profiles)
- In the cases of a match, a statistical weight to the likelihood of such a match can be obtained
Factors Affecting Genotyping Results
- There are numerous types of factors that alter the interpretation of an STR profile
o Genetic-related factors
o Amplification-related factors
o Electrophoresis-related factors
Mutations
- The loci used to evaluate STRs are selected in part because of low mutation frequencies
- A mutation is just a change in the DNA that is brought about by a rare event
o Can cause the changing of a single base pair
o Can result in duplication/deletion of large section of DNA or entire chromosome
- Despite tests to ensure low mutation rates, some mutations can occur in STRs and alter the interpretation of STR profiles
Chromosomal and Gene Duplications
- In some cases, duplication of one of the two chromosomes can lead to three chromosomes for an allele
o Remember you are diploid – got one set of DNA from your mother, one set from father
- This condition is called trisomy and is associated with many genetic diseases
- At certain loci, three alleles will appear
Tri-Alleles
- When there are tri-alleles, interpretation can be altered
- If only one locus shows the tri-allele, then it is probably from a single source
Point Mutations
- Point mutations involve the change of the nucleotide sequence at a singular point
- This is particularly problematic when the point mutation occurs in the primer binding site
- A change in the nucleotide sequence at the primer binding sit can lead to a failure of the amplification of that allele
- When an allele that should be present does not amplify, this is called a null allele
- If there is a mutation that makes it so the primer cannot bind, it is possible that one of the alleles would not amplify
- If there is a mutation in the primer site of the 18 repeat allele, then it will not amplify
Amplification-Related Artifacts
- There are several artifacts that can be introduced during the amplification process
- The two we will discuss in this class are
o Stutter
o Non-template Adenylation
What is stutter?
- During the extension phase of PCR, some portion of DNA may be “slip” forward or backwards
- This slip leads to a produce that is one repeat short (more common) or longer (less common) than the true allele
Why Stutter is Problematic
- Difficult because stutter is located where a true allele would be
Non-Template Adenylation
- During PCR amplification, Taq Polymerase generally adds an extra adenine (the “A” base) to the 3’ end of the amplicon
- This addition is referred to as a non-template addition – the addition of a base that is not determined by the sequence of the template strand
- Most multiplex kits are design to factor the addition but occasionally some unadded forms will be present (typically from too much template DNA)
o The unadded form are one based (an “A”) shorter
Electrophoretic Based Artifacts
- There are several artifacts that can be introduced during the electrophoresis step of analysis
- These artifacts include:
o Pull-up peaks
o Spikes
Pull-Up Peaks
- A pull-up peak is when a minor peak of one color is “pulled up” from a major peak in another color
- This is the result of the sample being overloaded or a bad spectral calibration
- If the pull-up peak corresponds to the position of an allele in another color channel, then the interpretation of the DNA profile may not be accurate
Spikes
- Spikes are very sharp peaks (narrower than a true allele peak) that are present in all the color panels
- Spikes are caused by either air bubbles or changes in the voltage
- If spikes occur, the sample needs to be re-run
- The spike will be present at approximately the same height in each color channel
Genotyping Challenging Forensic Samples
- Numerous factors unrelated to genetic, amplification, or electrophoretic characteristics can also impact DNA analysis
- Many of these types of factors are a result of the environment the DNA sample is collected from and are unavoidable
- These factors include
o Degraded DNA
o Low copy number DNA testing
o Mixtures
Degraded DNA
- DNA degradation is the breaking down of large DNA molecules into smaller fragments
- This break down is brought about by environmental factors such as high heat and humidity
- The normal size range of STRs is between 100-500bp
- Alleles that are larger (closer to 500bp) are more likely to be degraded than the smaller alleles
- In a degraded sample, larger DNA is less likely to be amplified since it is degraded
- These alleles will “drop out”
Low Copy Number DNA Testing
- Low copy number (LCN) DNA is a sample with a very low amount of DNA (less than 100 picograms)
- LCN DNA is often found in instances of touch DNA samples
- Samples in which there is a low amount of DNA can be amplified by increasing the cycle numbers
- Increasing the cycle number allows for the amplification of smaller amounts, but it also introduces other artifacts
- The other artifacts introduced include allele dropout, heterozygote peak imbalance, and increased stutter product, which makes interpretation more difficult
- Since the samples are low in DNA, re-amplification to confirm the presence of true alleles is not very likely
Mixtures
- A mixture is a sample that includes DNA from two or more contributors
- In some cases, you know that one of the contributors is the victim
- In other cases, it is unknown how many contributors may be present
Mixture Interpretation
- Mixture interpretation is the interpretation of DNA profiles that contain mixtures
- The field of DNA analysis is still searching for the best approach to interpreting mixtures
- mixture interpretation can be made more complicated by the different types of artifacts previously discussed
- there are a number of factors that are indicative of the presence of a mixture
o severe heterozygote imbalance
o increased stutter
o presence of three or more alleles per locus
Heterozygote Imbalance
- heterozygote imbalance is when the two alleles of a heterozygote individual at a certain locus are not approximately equal in height
- it is expected there will be some differences in height, but if the ratio is less than 60% than it can be an issue
The Number of Alleles per Locus
- for a single source profile (one contributor), the maximum number of alleles expected is two (heterozygous for that locus)
- it is also possible that there will only be one allele shown
- if there are more than two alleles at more than one locus, then it is probably that the profile is a mixture involving two or more contributors
o three alleles at single locus is likely a tri-allele
A Mixture Profile
- mixtures are messy
- if you have more than two contributors, it becomes really hard to determine individual contributors
- in two person mixtures, you can possible determine individual profiles if at least one is true
o there is a victim profile
o if there is a major and minor contributor
Summary of Factors Impacting STR Genotyping
- Genetic
o Tri-alleles
o Null alleles
- Amplification
o Stutter
o Non-template adenylation
- Electrophoresis
o Spikes
o Pull-up
- Sample quality
o Degraded DNA
o Low copy number DNA
o Mixtures
When Mitochondrial DNA Profiling is Used
- In cases where there are samples that contain little or no nuclear DNA (nDNA), mitochondrial DNA (mtDNA) can be used
o Some samples do not contain any nDNA (ex. Hair shafts)
o In other cases, the nDNA that was present may have been degraded (ex. Mass disaster cases)
The Mitochondrial Genome
- The first mitochondrial genome was sequenced by Fred Sanger’s laboratory in 1981 at Cambridge University
o The Cambridge reference sequence (CRS)
- Due to errors in the original sequence, a revised Cambridge reference sequence (rCRS) was published in 1999
- The rCRS is used as the point of comparison for all mitochondrial DNA forensic samples
- The mitochondrial genome encodes for a total of 37 genes
- There are no introns in the mitochondrial genome
- The control region is hypervariable, and therefore able to be utilized for forensic purposes (aka the D-loop)
The Hypervariable Regions
- There are a total of 3 hypervariable (HV) regions:
o HVI
o HVII
o HVIII
- HVI and HVII are used for forensic purposes since they are the most polymorphic
Heteroplasmy
- Heteroplasmy is when an individual carries more than one mtDNA haplotype (think of haplotype as a genotype)
- It is possible that the individual carries one haplotype in one type of tissues like hair, and another haplotype in skin cells
- Two types of heteroplasmy that exist:
o Sequence heteroplasmy
o Length heteroplasmy
Sequence Heteroplasmy
- Sequence heteroplasmy is defined as the presence of two difference nucleotides at a single position
- Represented as overlapping peaks in an electropherogram
Length Heteroplasmy
- Length heteroplasmy is typically due to differences in the length of the “C-stretch” between two mtDNA haplotypes
o A “C-stretch” is just numerous cytosines (the “C” base) in a row
mtDNA Sample Processing
- Many of the same procedures used to extract and quantify nuclear DNA can be used for mtDNA
- PCR steps are similar, with the following exceptions:
o Different primers are used
o A higher number of PCR cycles is generally used
§ Makes the reaction more sensitive, but also increases the likelihood of contamination – use controls to monitor contaminations
DNA Sequencing of mtDNA Samples
- The common DNA sequencing technique for mtDNA samples is the chain termination method
- A sequencing reaction contains the following:
o Template DNA
o Primers
o DNA polymerase
o Cofactors
o dNTPs
o ddNTPS – the same as dNTPs but are missing a hydroxide group: these are fluorescently labeled
§ the absence of the OH group prevents the chain from growing anymore; terminates growth
dNTP present: Chain can grow
- typically, a dNTP is incorporated and the chain can continue growing
- in this example, a third base has been added and more could be as well
- the presence OH group allows for more bases to be added
ddNTP present: Chain cannot grow
- When a ddNTP is incporated, the chain can no longer grow
How ddNTPs are Visualized
- The ratio of dNTPs (which allow chain growth) and ddNTPs (which terminate growth) varies in a reaction
- At the end of a sequencing reaction, you end up with lots of fragments that are different lengths
- The different lengths vary by one nucleotide to include the shortest possible fragment, the longest possible fragment, and everything in between
Cycle Sequencing
- The chain termination method is carried out by cycle sequencing
- Cycle sequencing uses thermal cycling (just like PCR) to generate a single stranded template for the chain-termination sequencing reactions
- The three steps of thermal cycling are the same as PCR:
o Denaturing (double to single stranded)
o Annealing (primer attached to single stranded template)
o Extension (DNA polymerase adds dNTPs to form new strand; the addition of a ddNTP terminates the growth)
After Cycle Sequencing
- Following cycle sequencing, the different length fragments can be separated using capillary electrophoresis
- The only difference between mtDNA capillary electrophoresis and nDNA capillary electrophoresis (STRs) are:
o POP-6 is used as the matrix
o A longer capillary tube is used
- Modifications to capillary electrophoresis process allow for better resolution to the single base level
- The mitochondrial DNA profile should be sequenced in both directions (forward and reverse)
- If there is enough sample available, samples should be sequenced twice
How Sequences are Reported
- For mitochondrial sequence analysis, every sequence is compared to the rCRS
- The data for sequence reporting consists of the nucleotide position and the base that differs for the mitochondrial DNA profile
How Results are Reported
- Three possibilities: exclusion, cannot exclude, inconclusive
- Exclusion: if the questioned and known sequenes are different, samples can be excluded
o At least two differences need to be reported, sincemtDNA has a higher mutation rate
- Cannot Exclude: if the questioned and known sequences are the same, they cannot be excluded
- Inconclusive Result: if the questioned and known sequences differ by only a single nucleotide, the result is inconclusive
Y-STRs
Y-Chromosome Inheritance
- The Y-chromosome is only present in males
- It has a mode of inheritance known as patrilineage, where a father passes it on to all his male offspring
- The Y-chromosome contains ~59 million base pairs
- It encodes for 50-60 genes
Why Use Y-STRs
- In many ways, Y-STRs are inferior to STRs
- The same Y-STR profile is shared by all male relatives through patrilineage inheritance
- When calculating the statistical weight of the evidence, the Y-STRs are so close to each other, so their frequencies can only be added, not multiplied
Why Use Y-STRs
- In the case of sexual assaults with a female victim and a male perpetrator, often there is a mixture of DNA evidence
- This mixture is generally made up of a large amount of female DNA and only a small portion of male DNA
- Y-STRs can be used in such circumstances as the female DNA will not interfere
- Y-STRs are also useful for cases of sexual assaults and multiple male perpetrators
The Core Y-STR Loci
- The Y-STR loci are constantly expanding
- The initial core Y-STR loci included a total of 9 loci
- This number has increased based on the different kits used and new information obtained
- Current kits can multiplex and amplify over 20 different Y-STR loci
- The same methods used to isolate, amplify, and separate STRs can be used for Y-STRs (just with different primers)
Multiple Y-STR Loci
- Since there is only one Y-chromosome, there is an exception that only one allele should be present
- For multilocal Y-STR loci, there is a duplication that leads to two alleles being present (there is still only one y-chromosome)
- If two alleles are present for a Y-STR, it is referred to as bilocal
Interpretation of Results
- There are three possible determinations when comparing an unknown and a known Y-STR genotype
o Exclusion: Y-STRS are different and could not have originated from the same source
o Inconclusive: there is insufficient data to make a determination on the origin of the source (partial profile)
o Failure to Exclude: Y-STRs from the unknown and known profile are the same and therefore could have originated from the same source
Future of Y-STRs: Rapidly Mutating Y-STR
- There is a subset of Y-STR that are referred to as rapidly mutating Y-STRs (RM Y-STRs)
- The average mutation rate is 0.0001, for RM Y-STRS it is 0.01
- Currently, the Y-STR loci cannot differentiate between patrilineage relatives
- The RM Y-STRs, since they mutate at a higher rate, open the possibility of differentiating between patrilineage relatives