SNP, GWAS, and post-GWAS

0.0(0)
studied byStudied by 0 people
full-widthCall with Kai
GameKnowt Play
learnLearn
examPractice Test
spaced repetitionSpaced Repetition
heart puzzleMatch
flashcardsFlashcards
Card Sorting

1/20

encourage image

There's no tags or description

Looks like no tags are added yet.

Study Analytics
Name
Mastery
Learn
Test
Matching
Spaced

No study sessions yet.

21 Terms

1
New cards

Identify disease genes

  • Identifying genes that contribute to disease risk is one of the main objectives of molecular research.

  • Such findings have contributed to improvements in diagnosis, prognosis, and therapy.

  • With the successful identification of disease genes for many single-gene disorders, the focus has shifted to diseases with a complex, multifactorial etiology.

2
New cards

Genome wide association studies (GWAS)

  • Identify associations between genetic variations (loci) and traits (including diseases).

    • Test for differences in the frequency of genetic variants between individuals who are ancestrally similar but differ phenotypically.

  • Genetic variations:

    • Single nucleotide polymorphism (SNP) → most common.

    • Copy number variants.

    • Large sequence variations.

  • Traits:

    • Disease (e.g., cancer, Alzheimer’s).

    • Phenotypes (e.g., height, eye color).

    • Gene expression (eQLT).

    • Etc.

3
New cards

Sinlge nucleotide polymorphism (SNP)

  • Most common type of genetic variation:

    • Single base pair.

    • Responsible for 90% of all human genetic variations.

  • SNPs are scattered all around the genome:

    • Within genes (coding SNPs): located within the coding region of a gene, may change the amino acid sequence of the gene’s protein product.

    • Outside of genes (non-coding, the majority): may change the timing, location, or gene expression level.

  • What is a SNP:

    • A SNP occurs when another nucleotide replaces one nucleotide at a specific base pair position in the DNA sequence.

      • It can be a substitution, insertion, or deletion.

4
New cards

SNP database

  • NCBI Short Genetic Variation database (dbSNP) catalogs short variations in nucleotide sequences for humans.

5
New cards

Major and minor allele

  • Major allele is the most common variant found in a population, while the minor allele is the less common or rarer variant at that same position.

    • Minor allele frequency (MAF).

    • MAF > 1% → common SNP.

    • MAF < 1% → rare SNP.

  • Major and minor alleles are population-specific.

  • Focus often on the minor allele because minor alleles are crucial for identifying disease risks and studying genetic selection.

6
New cards

Synonymous vs. nonsynonymous SNPs

  • Synonymous SNPs:

    • Do not change the amino acid sequence of a protein.

    • Silent change.

  • Nonsynonymous SNPs:

    • Potentially alter protein structure and function.

  • This distinction arises because the genetic code is redundant, with multiple codons sometimes coding for the same amino acid.

7
New cards

Missense vs. nonsense SNPs

  • Missense and nonsense SNPs are both nonsynonymous mutations.

    • Change the amino acid sequence.

  • Missense mutation:

    • Results in a different amino acid being incorporated into the protein.

    • It will alter the protein’s structure and function.

  • Nonsense mutation:

    • This changes from a sense codon to a premature stop codon.

    • This leads to a truncated and often non-functional protein.

8
New cards

Linkage disequilibirum (LD) blocks

  • Sets of nearby SNPs on the same chromosome are inherited together in blocks.

  • Haplotype/LD block:

    • A group of alleles that are co-inherited as a single block.

    • LDTools: LDHap, LDMatrix, etc.

9
New cards

Tag SNPs

  • A few SNPs are enough to identify the haplotypes in a block uniquely.

  • Reduce the number of SNPs required to examine the entire genome for association with a phenotype.

10
New cards

Methods other than measure SNPs

  • SNPs can be measured through genotyping and DNA sequencing.

  • Genotyping examines a specific set of targeted sites within the genome using methods like SNP arrays.

    • A targeted approach, focusing only on pre-selected, known SNPs at specific positions in the genome.

  • DNA sequencing reads the entire DNA sequence, allowing for the discovery of known and unknown SNPs.

    • SNPs captured through variant calling after sequence alignment → vcf files.

    • Allow for the discovery of novel or unexpected SNPs that are not known beforehand.

    • Read the entire DNA segment → more data and more expensive than genotyping.

11
New cards

SNP selection (genotyping)

  • Identify DNA regions of interest.

  • Identify patterns of SNPs that are inherited together on a chromosome.

    • HapMap project: tested the association of millions of SNPs across multiple populations → produced the SNP panels that are used today.

  • Select tag SNPs that can represent a block of associated SNPs.

12
New cards

SNP genotype

  • SNP genotypes can be coded differently based on genetic models.

    • Additive model (ADD) → commonly used.

    • Dominant model (DOM).

    • Recessive model (REC).

  • SNP genotypes are commonly coded as 0, 1, or 2 to represent the number of copies of the minor allele.

    • 0 → homozygous dominant (e.g., AA).

    • 1 → heterozygous (e.g., Aa).

    • 2 → homozygous recessive (e.g., aa).

<ul><li><p>SNP genotypes can be coded differently based on genetic models.</p><ul><li><p><strong>Additive model</strong>&nbsp;<strong>(ADD) → commonly used</strong>.</p></li><li><p>Dominant model (DOM).</p></li><li><p>Recessive model (REC).</p></li></ul></li><li><p>SNP genotypes are commonly coded as 0, 1, or 2 to represent the number of copies of the minor allele.</p><ul><li><p>0 → homozygous dominant (e.g., AA).</p></li><li><p>1 → heterozygous (e.g., Aa).</p></li><li><p>2 → homozygous recessive (e.g., aa).</p></li></ul></li></ul><p></p>
13
New cards

Genotype/haplotype phasing

  • Current technology gives genotypes but not haplotypes.

  • The process of determining which alleles are located on the same chromosome, i.e., the haplotype.

  • Phasing tools: SHAPEIT5, BEAGLE5.5.

14
New cards

GWAS association testing

  • A simple statistical test, such as chi-square.

    • Case control analysis.

    • Without confounding factors.

  • Linear regression models.

    • If the phenotype is continuous (such as height, blood pressure, or body mass index).

  • Logistic regression models.

    • If the phenotype is binary (such as the presence or absence of disease).

  • Linear mixed models.

    • Can account for genetic relatedness among individuals.

<ul><li><p>A simple statistical test, such as chi-square.</p><ul><li><p>Case control analysis.</p></li><li><p>Without confounding factors.</p></li></ul></li><li><p>Linear regression models.</p><ul><li><p>If the phenotype is continuous (such as height, blood pressure, or body mass index).</p></li></ul></li><li><p>Logistic regression models.</p><ul><li><p>If the phenotype is binary (such as the presence or absence of disease).</p></li></ul></li><li><p>Linear mixed models.</p><ul><li><p>Can account for genetic relatedness among individuals.</p></li></ul></li></ul><p></p>
15
New cards

GWAS example

knowt flashcard image
16
New cards

Odds ratio

  • Measure of effect size.

  • OR = 1, no disease association.

  • OR > 1, allele C increases risk of disease.

  • OR < 1, allele C decreases the risk of disease.

<ul><li><p>Measure of effect size.</p></li><li><p>OR = 1, no disease association.</p></li><li><p>OR &gt; 1, allele C increases risk of disease.</p></li><li><p>OR &lt; 1, allele C decreases the risk of disease.</p></li></ul><p></p>
17
New cards

PLINK

  • Mostly shared.

<ul><li><p>Mostly shared.</p></li></ul><p></p>
18
New cards

GWAS results visualized

  • Manhattan plot:

    • Bonferroni testing threshold of p < 5 × 10-8.

      • Multiple test correction.

    • Genome-wide significance threshold = red line on the image.

      • The peaks are SNPs that are close to each other = co-inherited.

    • Significance threshold varies depending on:

      • Number of SNPs examined.

      • Number of subjects included.

      • Minor allele frequencies.

      • Etc.

  • LocusZoom: a combination of a Manhattan plot and a genome browser.

<ul><li><p>Manhattan plot:</p><ul><li><p>Bonferroni testing threshold of p &lt; 5 × 10<sup>-8</sup>.</p><ul><li><p>Multiple test correction.</p></li></ul></li><li><p>Genome-wide significance threshold = red line on the image.</p><ul><li><p>The peaks are SNPs that are close to each other = co-inherited.</p></li></ul></li><li><p>Significance threshold varies depending on:</p><ul><li><p>Number of SNPs examined.</p></li><li><p>Number of subjects included.</p></li><li><p>Minor allele frequencies.</p></li><li><p>Etc.</p></li></ul></li></ul></li><li><p>LocusZoom: a combination of a Manhattan plot and a genome browser.</p></li></ul><p></p>
19
New cards

Genotype datasets for large scale GWAS

  • Biobanks and large population-based studies with genetic and phenotype data available for research.

    • For USA → ‘All of Us’ initiative or 23andMe.

  • Genotype data are typically restricted due to re-identification risk.

    • Application needed.

20
New cards

Databases for GWAS summary statistics

  • GWAS Catalog.

  • GWAS Atlas.

  • Allow easy access to summary statistics for thousands of traits.

  • Many downstream analyses are built on the summary statistics, rather than the raw genotype itself.

21
New cards

Post-GWAS analysis

  • Functional mapping:

    • Where are they located in the DNA?

    • Pathways enriched by GWAS findings.

    • How may they influence molecular functions leading to disease?

    • Identify the tissues or cell types where these variants are likely to act.

  • Causal variants identification:

    • GWAS findings come up as clusters → highly correlated.

    • Which one is likely causal?