SNPs and Haplotypes Notes
SNPs and Haplotypes
Biological Background
Researchers aim to identify and study changes occurring in various diseases.
They seek to explain why treatment responses vary among individuals.
SNPs
'SNP' is the answer to questions about disease changes and treatment response.
SNPs are involved in different aspects of health.
Genetic Marker
A genetic marker is a gene or DNA sequence with a known location on a chromosome, to identify individuals or species.
It can be a short DNA sequence around a single base-pair change (SNP) or a long sequence like minisatellites.
Polymorphism
Alleles: Alternative DNA sequences at a locus.
Technical Definition: The most common variant (allele) occurs with less than 99% frequency in the population.
Also a general term for variation.
Many types of DNA polymorphisms exist, including RFLPs, VNTRs, and microsatellites.
'Highly polymorphic' means many variants.
Recombination
Inheritance of genetic material without recombination:
Mother and father contribute genetic material directly to offspring.
Inheritance of genetic material with recombination:
Genetic material from mother and father recombines to form offspring's genetic material.
Recombination Shapes Genome Structure
Ancestral population evolves over generations into modern chromosomes, resulting in a mosaic of ancestral chromosomes.
Recombination frequencies are non-uniform across genomes.
Recombination hotspots exist.
Mutations
Mutations are a natural process that changes a DNA sequence.
As a cell copies its DNA before dividing, a "typo" occurs every 100,000 or so nucleotides.
"Germline" mutations are inherited by offspring.
Some mutations are benign, others can be deleterious.
Mutations create genetic diversity in the population, leading to genetic polymorphisms.
Mutations Create Genetic Diversity
Ancestral population experiences mutations over generations, leading to modern chromosomes that are a mosaic of ancestral chromosomes with mutations.
Genetic polymorphism arises.
Types of Polymorphism
Single base mutation (SNP)
Restriction fragment length (RFLP):
Creating restriction sites via PCR primer.
Direct sequencing.
Insertion/deletion of a section of DNA:
Minisatellites: repeated base patterns (several hundred base pairs).
Microsatellites: 2-4 nucleotides repeated.
Presence or absence of Alu segments.
The frequency of SNPs is greater than that of any other type of polymorphism.
What is SNP?
A SNP is defined as a single base change in a DNA sequence that occurs in a significant proportion (more than 1 percent) of a large population.
Other Types of Genetic Polymorphisms
Structural variants:
Insertions/deletions, duplications, copy number variations.
Concordant (A), Relative insertion (B), Relative deletion (C), Relative inversion (D).
Other Types of Genetic Polymorphisms
Insertion/deletion of a section of DNA:
Minisatellites: repeated base patterns (10-60 base pair fragments repeated 5-50 times).
Microsatellites: 2-4 nucleotides repeated.
Presence or absence of Alu segments.
Facts about SNPs
In human beings, 99.9 percent of bases are the same. The remaining 0.1 percent makes each person unique.
Different attributes/characteristics/traits:
How a person looks.
Diseases he or she develops.
These variations can be:
Harmless (change in phenotype).
Harmful (diabetes, cancer, heart disease, Huntington's disease, and hemophilia).
Latent (variations found in coding and regulatory regions, are not harmful on their own, and the change in each gene only becomes apparent under certain conditions, e.g., susceptibility to lung cancer).
Facts Continued about SNPs
SNPs are found in coding and (mostly) noncoding regions.
Occur with a very high frequency, about 1 in 1000 bases to 1 in 100 to 300 bases.
The abundance of SNPs and the ease with which they can be measured make these genetic variations significant.
SNPs close to a particular gene act as a marker for that gene.
SNPs in coding regions may alter the protein structure made by that coding region.
Terminology
Allele: different forms of genetic variations at a given gene or genetic locus.
Locus 1 has two alleles, A and T, and Locus 2 has two alleles, C and G.
Genotype: specific allelic make-up of an individual's genome.
Individual 1 has genotype AA at Locus 1 and genotype CG at Locus 2.
Heterozygous/Homozygous
Locus 1 of Individual 1 is homozygous, and Locus 2 is heterozygous.
SNPs are bi-allelic.
Micro/minisatellites have many alleles and are very informative because of the high heterozygosity (the chance that a randomly selected person will be heterozygous).
SNP Frequency
More than 5 million common SNPs each with frequency 10-50% account for the bulk of human DNA sequence difference.
About 1 in every 600 base pairs.
It is estimated that ~60,000 SNPs occur within exons; 85% of exons within 5 kb of the nearest SNP.
What Can We Learn from Genetic Variation
Population Evolution: The majority of human sequence variation is due to substitutions that have occurred once in the history of mankind at individual base pairs.
There can be big differences between populations!
Markers for pinpointing a disease: certain polymorphisms are linked to disease phenotypes.
Association study: check for differences in SNP patterns between cases and controls.
Forensic analysis: the polymorphisms provide individual and familiar signatures.
Why SNPs?
The majority of human sequence variation is due to substitutions that have occurred once in the history of mankind at individual base pairs, SNPs (Patil et al. 2001).
Markers for pinpointing a disease.
Association study: check for differences in SNP patterns between cases and controls.
There can be big differences between populations!
Why SNPs? (Continued)
Abundance: high frequency on the genome.
Position: throughout the genome.
coding region, intron region, promoter site.
Ease of genotyping (high-throughput genotyping).
SNPs account for around 90% of human genomic variation.
About 40 million or more SNPs exist in human populations.
Most SNPs are outside of the protein coding regions.
More than 5 million common SNPs each with frequency 10-50% account for the bulk of human DNA sequence difference.
It is estimated that ~60,000 SNPs occur within exons; 85% of exons are within 5 kb of the nearest SNP.
Account for most of the genetic diversity among different (normal) individuals, e.g., drug response, disease susceptibility.
However, only two alleles at each locus, less informative than microsatellites. (Use haplotypes!)
SNP Relevance
SNPs are found in coding and (mostly) noncoding regions.
Occur with a very high frequency, about 1 in 1000 bases to 1 in 100 to 300 bases.
The abundance of SNPs and the ease with which they can be measured make these genetic variations significant.
SNPs close to a particular gene act as a marker for that gene.
SNPs in coding regions may alter the protein structure made by that coding region.
SNP Maps
Sequence genomes of a large number of people.
Compare the base sequences to discover SNPs.
Generate a single map of the human genome containing all possible SNPs => SNP maps.
SNP Profiles
The genome of each individual contains a distinct SNP pattern.
People can be grouped based on the SNP profile.
SNP Profiles are important for identifying response to Drug Therapy.
Correlations might emerge between certain SNP profiles and specific responses to treatment.
Techniques to Detect Known Polymorphisms
Hybridization Techniques
Micro arrays
Real time PCR
Enzyme based Techniques
Nucleotide extension
Cleavage
Ligation
Reaction product detection and display
Significance of SNPs
In disease diagnosis.
In finding predisposition to diseases.
In drug discovery & development.
In drug responses.
Investigation of migration patterns.
All these aspects will help to look for medication & diagnosis at individual level.
Haplotype
A set of closely linked genetic markers present on one chromosome which tend to be inherited together (not easily separable by recombination).
Haplotype Definition
A Haplotype stands for a set of linked SNPs on the same chromosome.
Why Haplotypes?
Haplotypes are more powerful discriminators between cases and controls in disease association studies.
Use of haplotypes in disease association studies reduces the number of tests to be carried out.
With haplotypes, we can conduct evolutionary studies.
Haplotypes are necessary for linkage analysis.
Advantages of SNPs
SNPs ARE THE MOST FREQUENT FORM OF DNA VARIATIONS
THEY ARE THE DISEASE CAUSING MUTATIONS IN MANY GENES
THEY ARE ABUNDANT & HAVE SLOW MUTATION RATES
EASY TO SCORE
MAY WORK AS THE NEXT GENERATION OF GENETIC MARKERS
Haplotype Applications
The first application is the accurate interpretation of personal genomes, particularly in the context of medical genetics. As humans are diploid organisms, haplotype information is essential to each personal genome, for instance, to assess the phase of potentially disease-causing recessive mutation.
Haplotype knowledge is useful in population genetics and human disease studies. For example, the inference of Neanderthal ancestry in non-Africans exploited the availability of a human reference genome that was derived from local segments of African and European ancestry.
Haplotype information can be applied to studies of biological mechanisms. One example of this is the HeLa genome, for which scientists generated a haplotype-resolved genome sequence by fosmid clone dilution pool sequencing. (Fosmid is a cosmid cloning system but it has an F-factor origin of replication to control the copy number of the vector. Fosmids are used for genomic library).
Haplotype information can also facilitate noninvasive fetal genome sequencing. Accurate early inference of allelic inheritance genome- wide has the ability to simultaneously determine the risk of the thousands of individually rare, but collectively common Mendelian disorders in a single test.
Some Important SNP Database Resources
dbSNP (http://www.ncbi.nlm.nih.gov/SNP/) LocusLink (http://www.ncbi.nlm.nih.gov/LocusLink/list.cgi)
TSC (http://snp.cshl.org/)
SNPper (http://snpper.chip.org/bio/)
JSNP (http://snp.ims.u-tokyo.ac.jp/search.html)
GeneSNPs (http://www.genome.utah.edu/genesnps/)
HGVbase (http://hgvbase.cgb.ki.se/)
PolyPhen (http://dove.embl-heidelberg.de/PolyPhen/) OMIM (http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=OMIM)
Human SNP database (http://www-genome.wi.mit.edu/snp/human/)
Data Flow in dbSNP
a) Submission
b) Database Build
Research labs
--TGA[G/C]CTA--
#
Sequencing centers
--TGA[G/C]CTA--
Databases
--TGA[G/C]CTA--
NCBI
c) Retrieval
d) Applications
Genotypes
Allele frequencies
Location
Heterozygosity
Literature
Pubmed
OMIM
Gene view
Map view
Pharmacogenomics
\M
Functional genomics
GWAS
Validation
Evolutionary studies
Viewing SNP Associated with a Gene
Sort with Gene Name:
Search the Gene database with the gene name. If you know the symbol and species, enter them as follows: tpo[sym] AND human [orgn]
Click on the desired gene.
In the list of Links on the right, click "GeneView in dbSNP". If the link is not present, no SNPs are currently linked to this gene.
For human genes, another option is to go to the variation section (Click on Variation in the table of contents in the upper right), and follow links to Variation Viewer for either the GRCh37/hg19 or GRCh39/h38 assemblies, to the 1000 Genomes Browser, ClinVar and more.
TCF7L2 Details
Details on the TCF7L2 gene, including its function as a transcription factor and its implication in blood glucose homeostasis and type 2 diabetes.
Variation Viewer
Links to the Variation Viewer for exploring SNPs and other variations in the human genome, with options to view data from different assemblies (GRCh37/hg19 or GRCh38/h38) and related resources.
SNP Details
Information on specific SNPs, including their alleles, genomic position, and clinical significance, along with links to publications and genomic viewers for further exploration.
Information about a specific SNP (rs1420003331), including its alleles (G>T), position on chromosome 10, and potential consequences on the TCF7L2 gene.
Allele Details
Details about the alleles of a specific SNP and their potential effects on protein sequences.
Genomic Context
Visual representation of genomic data, including gene annotations, clinical significance of variations, and links to external resources such as HPA RNA-seq data and Ensembl.
Variation Viewer
Navigating the Variation Viewer to explore SNPs and other variations in the human genome, with options to filter and display data based on different criteria.