SNP and DNA sequencing - Week 5 Lecture
SNPs and DNA Sequencing – Comprehensive Study Notes
SNPs: Basics and Characteristics
Single Nucleotide Polymorphisms (SNPs) are single base sequence changes between individuals.
They change the content but not the length of the DNA sequence.
Most common type of genetic variation in humans.
Approximate density:
Therefore, there are >10 million SNPs in humans.
SNPs are most common in non-coding regions but can occur in genes as well.
SNPs in genes can lead to disease (example mentioned: Cystic Fibrosis).
SNPs are not evenly distributed along chromosomes (regions of high density and regions of low density).
SNP Genotypes and Alleles
SNP alleles are usually biallelic (two variants): the original (reference) allele and a new variant.
Very rarely can a SNP have four alleles; typically mutations occur once and a second mutation at the same site is unlikely.
Genotypes can be described as combinations of alleles (examples):
No SNPs: A/A (homozygous reference)
One SNP: A/T (heterozygous)
Two SNPs: T/T (homozygous variant) [illustrative example from class materials]
SNPs are named based on the order of identification and submission to public databases.
Public databases use rs numbers for submission (e.g., rs356085).
Alternate alleles are indicated in square brackets, and flanking sequence is provided to identify location.
Example entry:
rs356085 with sequence context: TCCCTTTTTGGCAGGCATTCCAGGC[A/G]ATTTGAGTGGTTCCCACTTAGTTCG
Allele Frequency and Population Variation
SNP alleles in a population are not evenly distributed; frequency depends on the time since the SNP arose and population history.
Major allele: most common allele in a population.
Minor allele: least common allele in a population.
Allele frequencies can vary between populations and are generally only called an "allele" if frequency > 1% in the tested population.
SNP Inheritance
SNPs are inherited from parents to offspring following Mendelian inheritance rules.
There is no gender bias unless the SNP is on the X or Y chromosome.
Example genotype patterns reflecting inheritance: A/A, A/T, T/T across generations can illustrate parental transmission (as shown in class diagrams).
SNP Haplotypes
SNPs that are close together on a chromosome tend to be inherited together; the combination of adjacent alleles forms a SNP haplotype.
Haplotypes can be named and used as a unit for analysis (e.g., Haplotype 1, Haplotype 2, Haplotype 3).
Haplotypes can be population-specific and may define ethnic groups.
A haplotype is defined as the arrangement of adjacent SNPs on a chromosome.
Practical note: There are a finite number of possible haplotypes for a given set of SNPs (e.g., with 10 SNP markers, there are 210 possible haplotypes in the slide content; in general this can be up to 2^n for n SNPs, but not all combinations occur in reality).
SNPs in Forensics
SNP haplotypes can be population-specific and useful for inferring ancestry or including/excluding suspects.
Example scenario (historical case): 5 murders and rapes in Louisiana where witnesses claimed a white male left crime scenes; 1,000 white male DNA samples were tested with no STR match; SNP testing on the crime scene sample yielded an ancestry estimate (85% African-American, 15% American-Indian); targeted investigation led to arrest and STR testing confirmed the ID.
Practical considerations:
SNPs are rarely used as the sole method of identification due to relatively few alleles per SNP; therefore a large panel of SNPs is needed.
SNPs are often added to STR typing to improve statistical power.
Advantages of SNPs: very short PCR amplicons (often < 100 bp), which is helpful for degraded or old samples.
STR markers are typically 100–400 bp in length, whereas SNP markers are shorter (< 100 bp).
DNA Sequencing: Overview and Role in SNP Discovery
DNA sequencing determines the exact order of nucleotides in a DNA molecule.
It is the most common method for identifying SNPs.
Historical context: First sequencing methods described in the 1970s; early methods were slow and laborious.
Modern sequencing benefits: fluorescence-based systems and capillary electrophoresis speed up sequencing considerably.
Sanger Sequencing: Core Concepts
Based on PCR with several modifications; performed after initial PCR amplification when there is abundant template DNA.
The sequencing workflow involves cycles of heating and cooling to generate fragments of varying lengths that cover the target region.
Sanger sequencing relies on chain-termination chemistry using modified nucleotides.
Steps in Sanger Sequencing
Target DNA is amplified by PCR to obtain sufficient template.
Modifications to the standard PCR protocol create fragments of varying lengths for sequencing.
Key modifications include:
1) Using a single sequencing primer (instead of two primers) to avoid directional confusion when reading sequence.
2) Using modified nucleotides (ddNTPs) that terminate DNA synthesis.
3) Setting up four sequencing reactions, one for each base (A, C, G, T).Reactions are analyzed by electrophoresis to determine sequence based on fragment size and terminal base.
Sanger Sequencing – Single Primer Approach
After target DNA is amplified, a single sequencing primer binds to the template to initiate synthesis.
Using two primers would generate sequencing information from two directions and can be confusing for interpretation.
Demonstrative sequence excerpt (illustrative):
ACGCTGATCGGGTGCAGCTAGATCGCTAGCTAGCTGATCGATGATAGCTAGATC TGCGACTAGCCCACGT…
……………………………………………………………………………………………………CGAAAGTAGCTAGATC
CCACGT GAAACG
Sanger Sequencing – Modified Nucleotides (ddNTPs)
Modified nucleotides stop the extension reaction because they lack a 3' hydroxyl group needed for chain elongation.
Dideoxynucleotides (ddNTPs) are missing the 3' OH group; once incorporated, no further nucleotides can be added.
Sanger Sequencing – Four Reactions and End-Labeling
Historically, four separate reactions were required, each containing a different ddNTP (A, C, G, or T).
Each tube produced fragments of DNA that end with the base corresponding to the ddNTP present in that reaction.
Random incorporation occurs because regular dNTPs outnumber modified ddNTPs; occasionally a ddNTP is incorporated, causing termination at various points.
Result: Sets of fragments that all end with the same base, enabling sequence determination when compared across reactions.
Reading and Analyzing Sanger Sequencing Data
The products of the four reactions are run side by side on an electrophoresis gel (historical view) or analyzed in a capillary electrophoresis system (modern view).
Sequence is read by comparing lane positions and fragment sizes: smallest to largest correspond to terminal bases T, A, G, C in the historical gel layout (or by color in modern fluorescent methods).
Improvements over time include:
Fluorescent dyes that label each ddNTP with a different color (A, C, G, T).
Transition from four separate reactions to a single-tube, single-lane system in capillary electrophoresis.
Enhanced accuracy, speed, and generation of chromatogram outputs.
Reading Sequencing Chromatograms
Compare chromatograms between individuals to identify differences.
Example differences at a position can be read as:
Individual 1: T/T
Individual 2: C/T
Individual 3: C/C
Heterozygous positions show overlapping signals (both bases present).
Primers and Sequencing Design
The sequencing primer binds to the template DNA and provides a starting point for the polymerase.
You do not obtain sequence from the primer itself because the primer only initiates synthesis; sequencing starts from the first base after the primer, extending through the amplified target.
The amplified region includes the reverse primer binding site.
Example primer binding context (illustrative): ACGATTTGCTAGCTTAGCTAGCTAGCCGATAGGG
Exam-Style preparation: Key Questions to Answer
What is a SNP, how frequent are they, and how do they occur?
How do SNPs differ from VNTRs and STRs?
Why are there normally only two alleles for a given SNP?
What is a SNP haplotype?
How are SNPs used in forensics, and what are their advantages and limitations?
What are the main differences between PCR and sequencing reactions?
Why is a single primer used for sequencing?
How does Sanger sequencing work, at a conceptual level and in practice?
What is the significance of allele frequency (major vs minor) and population variation in interpreting SNP data?
How do SNPs contribute to disease risk and what ethical considerations arise in forensic and population genetics contexts?
Foundational and Real-World Connections
SNPs reflect underlying mechanisms of mutation and DNA damage, and their study ties to basic molecular biology concepts like replication fidelity and DNA repair.
Understanding haplotypes connects to population genetics, ancestry inference, and the design of genetic association studies.
Forensic applications illustrate how genetic variation can be leveraged for identification, diagnosis of ancestry, and statistical reasoning, with important ethical implications regarding privacy, bias, and the potential for misinterpretation.
Sanger sequencing remains a foundational technology for SNP discovery and validation, illustrating the evolution from multi-reaction, gel-based workflows to high-throughput, capillary-based and eventually next-generation sequencing systems.
Mathematical and Technical Highlights
SNP density:
For example, a typical haplotype analysis might involve testing 10 SNP markers, yielding up to possible haplotypes in theory, though only a subset is observed in populations (slide notes mention 210 observed/possible in a given context).
Amplicon size considerations in forensics: STR markers are typically 100–400 bp, while SNP markers are often < 100 bp, impacting the ability to recover data from degraded samples.
Quick References (From Slides)
SNP discovery approach: sequence many individuals and align reads to identify differences.
SNP nomenclature example: rsXXXXXX with alleles shown in brackets, e.g., [A/G].
Forensics case example references: Louisiana case with STR vs SNP testing and ancestry inference; importance of multiple markers for robust statistics.
Sanger sequencing evolution: from four separate ddNTP reactions to single-tube, capillary-based sequencing with fluorescent dyes and chromatogram outputs.
Notes: The material above faithfully reflects the provided transcript content (Chapter 15, pp. 342–362). Where examples or figures were described textually in the transcript, they are included as illustrative notes to aid study and exam preparation.