Forensic Bio Unit 3

Fluorescence Labeling

- A fluorescent dyes contain a fluorophore which is the component of the dye that actually fluoresces

- The fluorescence is caused when a certain wavelength of light hits the fluorophore and excites it

- The fluorophore will excite at one wavelength and emit at a separate wavelength

- The difference in the excitation wavelength and the emission wavelength of a dye is called the STOKES SHIFT

Dyes Used to Fluorescently Label PCR

- Different dyes can be used in the same analysis since optical filters can be used to separate out different dye colors

- The number of dues used depends on need (or kit used)

- Generally, the more dyes, the more loci (locations in the DNA that can be analyzed)

- Dyes used generally fall within the 400-650nm range

- Some kits used in forensics use up to 6 dyes

Singleplex vs Multiplex

- In PCR it is possible to use more than one primer set

- As long as the regions of interest are not overlapping, you can amplify more than one area

- The components will be the same, you just need to have a primer set for each region of interest

Multiplex Kits

- Current multiplex kits in forensics can amplify over 20 loci at the same time

- From this single PCR reaction, enough information can be generated to differentiate between every individual in the world (except for identical twins)

- This saves a large amount of time instead of setting up and running over 20 separate reactions for each sample

Summary

- Electrophoresis separates DNA based on size only

- Similarly sized PCR products (amplicons), even if from entirely different locations, cannot be separated by electrophoresis without labeling

- Fluorescently labeled products allow for DNA fragments of the same size to be distinguished from one another

Two Different Types of VNTR Assays

- RFLP – based VNTRs

- PCR-based VNTRs

o Both types use markers that are minisatellites (8-200bp repeats)

RFLP-based VNTRs

- This type of analysis was not based on PCR – requires a much larger amount of starting material for an analysis

- Multiple VNTR loci can be used

o Cannot be multiplexed (combined into one reaction)

- This method is based on restriction fragment length polymorphisms

Variable Number Tandem Repeats

- In this case, the repeat unit is 31 base pairs long

- There are 20 repeats total

- This would be an example of one allele

o Each person has two alleles (one from each parent)

- The number of repeats can vary widely (typically 10-30)

- Differences are based on length (ex. 30 repeats is longer than 28 repeats)

Restriction Fragment Length Polymorphism

- RFLP- based VNTR profiling utilizes restriction fragment length polymorphisms (RFLPs)

- RFLPs are based on restriction endonucleases, which are enzymes that cut DNA at a certain place

- The enzyme has what is known as a recognition sequence – they recognize a certain nucleotide sequences and cut at that sequence

Detecting Size Differences for VNTRs

- Since we are detecting differences in length, you can perform electrophoresis

- Once you have the difference size fragments, you can compare your crime scene sample to any possible suspects

Disadvantages of VNTRs with RFLP

- Does not perform well with degraded samples

- Does not perform well with samples in limited quantity

- Since both these characteristics are common among forensic samples, a different type of method is needed

PCR-Based VNTRs

- VNTRs using PCR to amplify the regions is called amplified fragment length polymorphism (AFLP)

- If the overall size of the locus is under 1000bo, then AFLPs can be used

- Similar to the RFLP method, AFLPs look at differences in the number of repeats by amplifying with primers in the conserved flanking regions

- Primers are designed to anneal in the conserved region, the number of repeats is what varies

PCR-Based VNTRs

- Primers are designed so the length of the flanking regions are always the same

- This is showing an example of a heterozygous individuals

Advantages of the PCR-based VNTRs

- The use of PCR makes it so less initial DNA is required

- By multiplexing multiple loci, it is possible to get a higher discriminatory power (a greater level of individualization)

- Many of the VNTR loci are in upwards of 1000bp in length, which is less than ideal for forensic samples

o The need to reduce this size leads to our next topic, STRs, which is how forensic testing is currently performed.

Minisatellites vs Microsatellites

- Minisatellites and microsatellites are both examples of tandem repeats (Adjacent regions of repeated units)

- The main difference between the two types is the length of the repeat

o Minisatellites are typically 8-200bp repeat units

o Microsatellites are typically 2-7bp repeat units

Microsatellites

- Smaller repeat number means smaller overall allel sizes

- Smaller allele sizes are better for forensic applications

- Why?

o Better for degraded DNA

o Easier to multiplex (increases discriminatory power)

Characteristics of STRs

- The microsatellites used in forensic analysis are short tandem repeats (STRs)

- These are based on nuclear DNA, so there are two copies for each locus (location)

- Alleles can be homozygous or heterozygous

- There are a large amount of STRs in the human genome (estimated to be over 100,000)

- Each STR is characterized by the core repeat and the flanking region

- The core repeat consists of the tandemly repeated regions

- The flanking region consists of a conserved area on each side (this is where the primers anneal)

Repeat Unit Length

- The different lengths of repeats for STRs range from dimeric (2bp), trimeric (3bp), tetrameric (4bp), pentameric (5bp), hexameric (6bp), and heptameric (7bp)

- Dimeric and trimeric repeats have issues with stutter

- Pentameric, hexameric, and heptameric are less abundant

- Tetrameric is the length used in all core loci used in forensics (USA)

o A few pentameric STR loci are utilized internationally

Repeat Unit Length

- The tetrameric unit length is commonly found in the human genome

- The tetrameric length is highly polymorphic – a crucial trait for STR analysis

o Polymorphic means that an allele appears in multiple forms, or that there is a lot of variety for that locus

Repeat Unit Seqences

- Not each type of repeat is the same

- The differences in types is based on the sequence of the repeats

- The core STR loci include each of these types

o Simple

o Compound

o Complex

Simple STR Repeats

- Simple repeats consist of tandem repeats with identical repeating units

- EX: repeat at D5 is (AGAT)n, where n is the number of repeats

- This allele number would be 10; (AGAT)10

Compound STR Repeats

- Compound repeats consist of more than one type of repeat

- EX: D8 – TCTAN [TCTG] NTCTAN

- This allele number is 14; [TCTA] 2 [TCTG][TCTA]11

Complex STR Repeats

- Complex repeats consist of several clusters of different tandem repeats with intervening sequences

How is the STR Repeat Named?

- The tetranucleotide repeat motif is named based on the top strand

o There are historical exceptions to this naming convention

- This would be referred to as a TCAT repeat

Characteristics of the Core Loci

- In the US, there are currently 20 loci (and a sex marker) that are part of the core CODIS loci

- The increase to 20 loci recently happened in 2017, as dating

o 1998 until 2017 only 13 loci were included in the original CODIS loci

- CODIS stands for Combined DNA Index System

Why Would you Expand the Core Loci Set?

- The move to increase the number of core CODIS loci was made primarily to address the following items:

o Decrease chance of adventitious match (a match due to random chance)

o Be more like international databases

o Increase discriminatory power

How are the Core CODIS loci named?

- The core CODIS loci are named based on either 1) the chromosome they are found or 2) the name of the gene they are a part of

- EX: D5S818 is located on chromosome 5

- Other loci (ex. FGA, CSF1PO) are genes, but refer specifically to the introns of those genes (regions not coding for the protein)

What Makes a Good Core CODIS Marker?

- The goal of these markers is to lead to the individualization of a sample

- What are some characteristics they could possess?

o Polymorphic

o Consistency in flanking regions among all individuals (primer locations)

o Smaller sizes are preferred (Degraded DNA)

o Not linked to other loci (generally on separate chromosomes; if on the same chromosome, far enough away not to be linked)

o Few amplification artifacts (ex: stutter)

Forensic STR Analysis

- The current STRs used in forensic investigation are amplified using fluorescently labeled primers

- The different amplicons are separated by capillary electrophoresis

- The different color dyes can be separated by the optical filters

Internal Lane Standard

- The internal lane standard (ILS) is added to each sample

- The ILS consists of DNA fragments of known size, making it possible to size your fragments of unknwon size

- This is the same function as the PCR marker for slab gel electrophoresis

An Electropherogram

- The data collection process generates an electropherogram

- An electropherogram is a display of the peaks representative of the different fragments

- An electropherogram is seprated into color channels and shows the relative amounts of each fragment by displaying an RFU (relative fluorescent unit) value

- The allele number is the number of repeats, the overall size is how many base pairs the fragment is long

Determining the Allele

- The internal lane standard is used to determine the size of the fragments represented by each peak

- How can we determine what the allele is? An allelic ladder

- An allelic ladder is a sample that is made up of the majority of known alleles at a certain locus

- PCR markers used in slab gel electrophoresis are sometimes called “ladders”

Allelic Ladder

- In this figure, each locus is separated

- The peaks represent the common alleles for that particular locus

Determining Genotype

- The genotype of an individual is the number of repeats for both alleles at that locus

STR Profile

- A STR profile consists of the genotype for all the loci used in a certain kit

- For example, a STR profile used for uploading to the CODIS database will include the genotype for all 20 core CODIS loci

Example of STR Profile

- This electropherogram shows an STR profile using the Identifier Kit

- This kit amplifies 15 loci plus a sex marker

- Prior to 2017, this kit could amplify the core CODIS loci

- Since all 13 core loci are represented, this was a complete STR profile

Interpretation of STR Profiling Results

- There are three common conclusions at the end of STR analysis

o Inclusion: the genotype of two compared STR profiles are identical

o Exclusion: the two genotypes differ, and that the profiles originated from different sources

o Inconclusive: indicates that there is not enough info to support a conclusion of either inclusion or exclusion (common for partial profiles)

- In the cases of a match, a statistical weight to the likelihood of such a match can be obtained

Factors Affecting Genotyping Results

- There are numerous types of factors that alter the interpretation of an STR profile

o Genetic-related factors

o Amplification-related factors

o Electrophoresis-related factors

Mutations

- The loci used to evaluate STRs are selected in part because of low mutation frequencies

- A mutation is just a change in the DNA that is brought about by a rare event

o Can cause the changing of a single base pair

o Can result in duplication/deletion of large section of DNA or entire chromosome

- Despite tests to ensure low mutation rates, some mutations can occur in STRs and alter the interpretation of STR profiles

Chromosomal and Gene Duplications

- In some cases, duplication of one of the two chromosomes can lead to three chromosomes for an allele

o Remember you are diploid – got one set of DNA from your mother, one set from father

- This condition is called trisomy and is associated with many genetic diseases

- At certain loci, three alleles will appear

Tri-Alleles

- When there are tri-alleles, interpretation can be altered

- If only one locus shows the tri-allele, then it is probably from a single source

Point Mutations

- Point mutations involve the change of the nucleotide sequence at a singular point

- This is particularly problematic when the point mutation occurs in the primer binding site

- A change in the nucleotide sequence at the primer binding sit can lead to a failure of the amplification of that allele

- When an allele that should be present does not amplify, this is called a null allele

- If there is a mutation that makes it so the primer cannot bind, it is possible that one of the alleles would not amplify

- If there is a mutation in the primer site of the 18 repeat allele, then it will not amplify

Amplification-Related Artifacts

- There are several artifacts that can be introduced during the amplification process

- The two we will discuss in this class are

o Stutter

o Non-template Adenylation

What is stutter?

- During the extension phase of PCR, some portion of DNA may be “slip” forward or backwards

- This slip leads to a produce that is one repeat short (more common) or longer (less common) than the true allele

Why Stutter is Problematic

- Difficult because stutter is located where a true allele would be

Non-Template Adenylation

- During PCR amplification, Taq Polymerase generally adds an extra adenine (the “A” base) to the 3’ end of the amplicon

- This addition is referred to as a non-template addition – the addition of a base that is not determined by the sequence of the template strand

- Most multiplex kits are design to factor the addition but occasionally some unadded forms will be present (typically from too much template DNA)

o The unadded form are one based (an “A”) shorter

Electrophoretic Based Artifacts

- There are several artifacts that can be introduced during the electrophoresis step of analysis

- These artifacts include:

o Pull-up peaks

o Spikes

Pull-Up Peaks

- A pull-up peak is when a minor peak of one color is “pulled up” from a major peak in another color

- This is the result of the sample being overloaded or a bad spectral calibration

- If the pull-up peak corresponds to the position of an allele in another color channel, then the interpretation of the DNA profile may not be accurate

Spikes

- Spikes are very sharp peaks (narrower than a true allele peak) that are present in all the color panels

- Spikes are caused by either air bubbles or changes in the voltage

- If spikes occur, the sample needs to be re-run

- The spike will be present at approximately the same height in each color channel

Genotyping Challenging Forensic Samples

- Numerous factors unrelated to genetic, amplification, or electrophoretic characteristics can also impact DNA analysis

- Many of these types of factors are a result of the environment the DNA sample is collected from and are unavoidable

- These factors include

o Degraded DNA

o Low copy number DNA testing

o Mixtures

Degraded DNA

- DNA degradation is the breaking down of large DNA molecules into smaller fragments

- This break down is brought about by environmental factors such as high heat and humidity

- The normal size range of STRs is between 100-500bp

- Alleles that are larger (closer to 500bp) are more likely to be degraded than the smaller alleles

- In a degraded sample, larger DNA is less likely to be amplified since it is degraded

- These alleles will “drop out”

Low Copy Number DNA Testing

- Low copy number (LCN) DNA is a sample with a very low amount of DNA (less than 100 picograms)

- LCN DNA is often found in instances of touch DNA samples

- Samples in which there is a low amount of DNA can be amplified by increasing the cycle numbers

- Increasing the cycle number allows for the amplification of smaller amounts, but it also introduces other artifacts

- The other artifacts introduced include allele dropout, heterozygote peak imbalance, and increased stutter product, which makes interpretation more difficult

- Since the samples are low in DNA, re-amplification to confirm the presence of true alleles is not very likely

Mixtures

- A mixture is a sample that includes DNA from two or more contributors

- In some cases, you know that one of the contributors is the victim

- In other cases, it is unknown how many contributors may be present

Mixture Interpretation

- Mixture interpretation is the interpretation of DNA profiles that contain mixtures

- The field of DNA analysis is still searching for the best approach to interpreting mixtures

- mixture interpretation can be made more complicated by the different types of artifacts previously discussed

- there are a number of factors that are indicative of the presence of a mixture

o severe heterozygote imbalance

o increased stutter

o presence of three or more alleles per locus

Heterozygote Imbalance

- heterozygote imbalance is when the two alleles of a heterozygote individual at a certain locus are not approximately equal in height

- it is expected there will be some differences in height, but if the ratio is less than 60% than it can be an issue

The Number of Alleles per Locus

- for a single source profile (one contributor), the maximum number of alleles expected is two (heterozygous for that locus)

- it is also possible that there will only be one allele shown

- if there are more than two alleles at more than one locus, then it is probably that the profile is a mixture involving two or more contributors

o three alleles at single locus is likely a tri-allele

A Mixture Profile

- mixtures are messy

- if you have more than two contributors, it becomes really hard to determine individual contributors

- in two person mixtures, you can possible determine individual profiles if at least one is true

o there is a victim profile

o if there is a major and minor contributor

Summary of Factors Impacting STR Genotyping

- Genetic

o Tri-alleles

o Null alleles

- Amplification

o Stutter

o Non-template adenylation

- Electrophoresis

o Spikes

o Pull-up

- Sample quality

o Degraded DNA

o Low copy number DNA

o Mixtures

When Mitochondrial DNA Profiling is Used

- In cases where there are samples that contain little or no nuclear DNA (nDNA), mitochondrial DNA (mtDNA) can be used

o Some samples do not contain any nDNA (ex. Hair shafts)

o In other cases, the nDNA that was present may have been degraded (ex. Mass disaster cases)

The Mitochondrial Genome

- The first mitochondrial genome was sequenced by Fred Sanger’s laboratory in 1981 at Cambridge University

o The Cambridge reference sequence (CRS)

- Due to errors in the original sequence, a revised Cambridge reference sequence (rCRS) was published in 1999

- The rCRS is used as the point of comparison for all mitochondrial DNA forensic samples

- The mitochondrial genome encodes for a total of 37 genes

- There are no introns in the mitochondrial genome

- The control region is hypervariable, and therefore able to be utilized for forensic purposes (aka the D-loop)

The Hypervariable Regions

- There are a total of 3 hypervariable (HV) regions:

o HVI

o HVII

o HVIII

- HVI and HVII are used for forensic purposes since they are the most polymorphic

Heteroplasmy

- Heteroplasmy is when an individual carries more than one mtDNA haplotype (think of haplotype as a genotype)

- It is possible that the individual carries one haplotype in one type of tissues like hair, and another haplotype in skin cells

- Two types of heteroplasmy that exist:

o Sequence heteroplasmy

o Length heteroplasmy

Sequence Heteroplasmy

- Sequence heteroplasmy is defined as the presence of two difference nucleotides at a single position

- Represented as overlapping peaks in an electropherogram

Length Heteroplasmy

- Length heteroplasmy is typically due to differences in the length of the “C-stretch” between two mtDNA haplotypes

o A “C-stretch” is just numerous cytosines (the “C” base) in a row

mtDNA Sample Processing

- Many of the same procedures used to extract and quantify nuclear DNA can be used for mtDNA

- PCR steps are similar, with the following exceptions:

o Different primers are used

o A higher number of PCR cycles is generally used

§ Makes the reaction more sensitive, but also increases the likelihood of contamination – use controls to monitor contaminations

DNA Sequencing of mtDNA Samples

- The common DNA sequencing technique for mtDNA samples is the chain termination method

- A sequencing reaction contains the following:

o Template DNA

o Primers

o DNA polymerase

o Cofactors

o dNTPs

o ddNTPS – the same as dNTPs but are missing a hydroxide group: these are fluorescently labeled

§ the absence of the OH group prevents the chain from growing anymore; terminates growth

dNTP present: Chain can grow

- typically, a dNTP is incorporated and the chain can continue growing

- in this example, a third base has been added and more could be as well

- the presence OH group allows for more bases to be added

ddNTP present: Chain cannot grow

- When a ddNTP is incporated, the chain can no longer grow

How ddNTPs are Visualized

- The ratio of dNTPs (which allow chain growth) and ddNTPs (which terminate growth) varies in a reaction

- At the end of a sequencing reaction, you end up with lots of fragments that are different lengths

- The different lengths vary by one nucleotide to include the shortest possible fragment, the longest possible fragment, and everything in between

Cycle Sequencing

- The chain termination method is carried out by cycle sequencing

- Cycle sequencing uses thermal cycling (just like PCR) to generate a single stranded template for the chain-termination sequencing reactions

- The three steps of thermal cycling are the same as PCR:

o Denaturing (double to single stranded)

o Annealing (primer attached to single stranded template)

o Extension (DNA polymerase adds dNTPs to form new strand; the addition of a ddNTP terminates the growth)

After Cycle Sequencing

- Following cycle sequencing, the different length fragments can be separated using capillary electrophoresis

- The only difference between mtDNA capillary electrophoresis and nDNA capillary electrophoresis (STRs) are:

o POP-6 is used as the matrix

o A longer capillary tube is used

- Modifications to capillary electrophoresis process allow for better resolution to the single base level

- The mitochondrial DNA profile should be sequenced in both directions (forward and reverse)

- If there is enough sample available, samples should be sequenced twice

How Sequences are Reported

- For mitochondrial sequence analysis, every sequence is compared to the rCRS

- The data for sequence reporting consists of the nucleotide position and the base that differs for the mitochondrial DNA profile

How Results are Reported

- Three possibilities: exclusion, cannot exclude, inconclusive

- Exclusion: if the questioned and known sequenes are different, samples can be excluded

o At least two differences need to be reported, sincemtDNA has a higher mutation rate

- Cannot Exclude: if the questioned and known sequences are the same, they cannot be excluded

- Inconclusive Result: if the questioned and known sequences differ by only a single nucleotide, the result is inconclusive

Y-STRs

Y-Chromosome Inheritance

- The Y-chromosome is only present in males

- It has a mode of inheritance known as patrilineage, where a father passes it on to all his male offspring

- The Y-chromosome contains ~59 million base pairs

- It encodes for 50-60 genes

Why Use Y-STRs

- In many ways, Y-STRs are inferior to STRs

- The same Y-STR profile is shared by all male relatives through patrilineage inheritance

- When calculating the statistical weight of the evidence, the Y-STRs are so close to each other, so their frequencies can only be added, not multiplied

Why Use Y-STRs

- In the case of sexual assaults with a female victim and a male perpetrator, often there is a mixture of DNA evidence

- This mixture is generally made up of a large amount of female DNA and only a small portion of male DNA

- Y-STRs can be used in such circumstances as the female DNA will not interfere

- Y-STRs are also useful for cases of sexual assaults and multiple male perpetrators

The Core Y-STR Loci

- The Y-STR loci are constantly expanding

- The initial core Y-STR loci included a total of 9 loci

- This number has increased based on the different kits used and new information obtained

- Current kits can multiplex and amplify over 20 different Y-STR loci

- The same methods used to isolate, amplify, and separate STRs can be used for Y-STRs (just with different primers)

Multiple Y-STR Loci

- Since there is only one Y-chromosome, there is an exception that only one allele should be present

- For multilocal Y-STR loci, there is a duplication that leads to two alleles being present (there is still only one y-chromosome)

- If two alleles are present for a Y-STR, it is referred to as bilocal

Interpretation of Results

- There are three possible determinations when comparing an unknown and a known Y-STR genotype

o Exclusion: Y-STRS are different and could not have originated from the same source

o Inconclusive: there is insufficient data to make a determination on the origin of the source (partial profile)

o Failure to Exclude: Y-STRs from the unknown and known profile are the same and therefore could have originated from the same source

Future of Y-STRs: Rapidly Mutating Y-STR

- There is a subset of Y-STR that are referred to as rapidly mutating Y-STRs (RM Y-STRs)

- The average mutation rate is 0.0001, for RM Y-STRS it is 0.01

- Currently, the Y-STR loci cannot differentiate between patrilineage relatives

- The RM Y-STRs, since they mutate at a higher rate, open the possibility of differentiating between patrilineage relatives