1/20
Looks like no tags are added yet.
Name | Mastery | Learn | Test | Matching | Spaced |
---|
No study sessions yet.
Identify disease genes
Identifying genes that contribute to disease risk is one of the main objectives of molecular research.
Such findings have contributed to improvements in diagnosis, prognosis, and therapy.
With the successful identification of disease genes for many single-gene disorders, the focus has shifted to diseases with a complex, multifactorial etiology.
Genome wide association studies (GWAS)
Identify associations between genetic variations (loci) and traits (including diseases).
Test for differences in the frequency of genetic variants between individuals who are ancestrally similar but differ phenotypically.
Genetic variations:
Single nucleotide polymorphism (SNP) → most common.
Copy number variants.
Large sequence variations.
Traits:
Disease (e.g., cancer, Alzheimer’s).
Phenotypes (e.g., height, eye color).
Gene expression (eQLT).
Etc.
Sinlge nucleotide polymorphism (SNP)
Most common type of genetic variation:
Single base pair.
Responsible for 90% of all human genetic variations.
SNPs are scattered all around the genome:
Within genes (coding SNPs): located within the coding region of a gene, may change the amino acid sequence of the gene’s protein product.
Outside of genes (non-coding, the majority): may change the timing, location, or gene expression level.
What is a SNP:
A SNP occurs when another nucleotide replaces one nucleotide at a specific base pair position in the DNA sequence.
It can be a substitution, insertion, or deletion.
SNP database
NCBI Short Genetic Variation database (dbSNP) catalogs short variations in nucleotide sequences for humans.
Major and minor allele
Major allele is the most common variant found in a population, while the minor allele is the less common or rarer variant at that same position.
Minor allele frequency (MAF).
MAF > 1% → common SNP.
MAF < 1% → rare SNP.
Major and minor alleles are population-specific.
Focus often on the minor allele because minor alleles are crucial for identifying disease risks and studying genetic selection.
Synonymous vs. nonsynonymous SNPs
Synonymous SNPs:
Do not change the amino acid sequence of a protein.
Silent change.
Nonsynonymous SNPs:
Potentially alter protein structure and function.
This distinction arises because the genetic code is redundant, with multiple codons sometimes coding for the same amino acid.
Missense vs. nonsense SNPs
Missense and nonsense SNPs are both nonsynonymous mutations.
Change the amino acid sequence.
Missense mutation:
Results in a different amino acid being incorporated into the protein.
It will alter the protein’s structure and function.
Nonsense mutation:
This changes from a sense codon to a premature stop codon.
This leads to a truncated and often non-functional protein.
Linkage disequilibirum (LD) blocks
Sets of nearby SNPs on the same chromosome are inherited together in blocks.
Haplotype/LD block:
A group of alleles that are co-inherited as a single block.
LDTools: LDHap, LDMatrix, etc.
Tag SNPs
A few SNPs are enough to identify the haplotypes in a block uniquely.
Reduce the number of SNPs required to examine the entire genome for association with a phenotype.
Methods other than measure SNPs
SNPs can be measured through genotyping and DNA sequencing.
Genotyping examines a specific set of targeted sites within the genome using methods like SNP arrays.
A targeted approach, focusing only on pre-selected, known SNPs at specific positions in the genome.
DNA sequencing reads the entire DNA sequence, allowing for the discovery of known and unknown SNPs.
SNPs captured through variant calling after sequence alignment → vcf files.
Allow for the discovery of novel or unexpected SNPs that are not known beforehand.
Read the entire DNA segment → more data and more expensive than genotyping.
SNP selection (genotyping)
Identify DNA regions of interest.
Identify patterns of SNPs that are inherited together on a chromosome.
HapMap project: tested the association of millions of SNPs across multiple populations → produced the SNP panels that are used today.
Select tag SNPs that can represent a block of associated SNPs.
SNP genotype
SNP genotypes can be coded differently based on genetic models.
Additive model (ADD) → commonly used.
Dominant model (DOM).
Recessive model (REC).
SNP genotypes are commonly coded as 0, 1, or 2 to represent the number of copies of the minor allele.
0 → homozygous dominant (e.g., AA).
1 → heterozygous (e.g., Aa).
2 → homozygous recessive (e.g., aa).
Genotype/haplotype phasing
Current technology gives genotypes but not haplotypes.
The process of determining which alleles are located on the same chromosome, i.e., the haplotype.
Phasing tools: SHAPEIT5, BEAGLE5.5.
GWAS association testing
A simple statistical test, such as chi-square.
Case control analysis.
Without confounding factors.
Linear regression models.
If the phenotype is continuous (such as height, blood pressure, or body mass index).
Logistic regression models.
If the phenotype is binary (such as the presence or absence of disease).
Linear mixed models.
Can account for genetic relatedness among individuals.
GWAS example
Odds ratio
Measure of effect size.
OR = 1, no disease association.
OR > 1, allele C increases risk of disease.
OR < 1, allele C decreases the risk of disease.
PLINK
Mostly shared.
GWAS results visualized
Manhattan plot:
Bonferroni testing threshold of p < 5 × 10-8.
Multiple test correction.
Genome-wide significance threshold = red line on the image.
The peaks are SNPs that are close to each other = co-inherited.
Significance threshold varies depending on:
Number of SNPs examined.
Number of subjects included.
Minor allele frequencies.
Etc.
LocusZoom: a combination of a Manhattan plot and a genome browser.
Genotype datasets for large scale GWAS
Biobanks and large population-based studies with genetic and phenotype data available for research.
For USA → ‘All of Us’ initiative or 23andMe.
Genotype data are typically restricted due to re-identification risk.
Application needed.
Databases for GWAS summary statistics
GWAS Catalog.
GWAS Atlas.
Allow easy access to summary statistics for thousands of traits.
Many downstream analyses are built on the summary statistics, rather than the raw genotype itself.
Post-GWAS analysis
Functional mapping:
Where are they located in the DNA?
Pathways enriched by GWAS findings.
How may they influence molecular functions leading to disease?
Identify the tissues or cell types where these variants are likely to act.
Causal variants identification:
GWAS findings come up as clusters → highly correlated.
Which one is likely causal?