SNP, GWAS, and post-GWAS

0.0(0)
studied byStudied by 0 people
call kaiCall Kai
learnLearn
examPractice Test
spaced repetitionSpaced Repetition
heart puzzleMatch
flashcardsFlashcards
GameKnowt Play
Card Sorting

1/38

encourage image

There's no tags or description

Looks like no tags are added yet.

Last updated 7:11 PM on 12/4/25
Name
Mastery
Learn
Test
Matching
Spaced
Call with Kai

No analytics yet

Send a link to your students to track their progress

39 Terms

1
New cards

Genome wide association studies (GWAS)

  • Identify associations between genetic variations (loci) and traits (including diseases).

    • Test for differences in the frequency of genetic variants between individuals who are ancestrally similar but differ phenotypically.

  • Genetic variations:

    • Single nucleotide polymorphism (SNP) → most common.

    • Copy number variants.

    • Large sequence variations.

2
New cards

Sinlge nucleotide polymorphism (SNP)

  • Most common genetic variation (~90% of human variation).

  • Single base pair change (substitution, insertion, or deletion).

  • Location:

    • Coding: may change protein sequence.

    • Non-coding (majority): may affect gene expression, timing, or location.

3
New cards

SNP database

NCBI Short Genetic Variation database (dbSNP) catalogs short variations in nucleotide sequences for humans.

4
New cards

Major and minor allele

  • Major allele is the most common variant found in a population, while the minor allele is the less common or rarer variant at that same position.

    • Minor allele frequency (MAF).

    • MAF > 1% → common SNP.

    • MAF < 1% → rare SNP.

  • Major and minor alleles are population-specific.

  • Focus often on the minor allele because minor alleles are crucial for identifying disease risks and studying genetic selection.

5
New cards

Synonymous vs. nonsynonymous SNPs

  • Synonymous SNPs:

    • Do not change the amino acid sequence of a protein.

    • Silent change.

  • Nonsynonymous SNPs:

    • Potentially alter protein structure and function.

  • This distinction arises because the genetic code is redundant, with multiple codons sometimes coding for the same amino acid.

6
New cards

Missense vs. nonsense SNPs

  • Missense and nonsense SNPs are both nonsynonymous mutations.

    • Change the amino acid sequence.

  • Missense mutation:

    • Results in a different amino acid being incorporated into the protein.

    • It will alter the protein’s structure and function.

  • Nonsense mutation:

    • This changes from a sense codon to a premature stop codon.

    • This leads to a truncated and often non-functional protein.

7
New cards

Linkage disequilibirum (LD) blocks

  • LD = non-random association of genes.

    • There is a pattern.

  • Sets of nearby SNPs on the same chromosome are inherited together in blocks.

  • Haplotype/LD block:

    • A group of alleles that are co-inherited as a single block.

    • LDTools: LDHap, LDMatrix, etc.

8
New cards

Tag SNPs

  • A few SNPs are enough to identify the haplotypes in a block uniquely.

    • Haplotype: a group of alleles/variants on the same chromosome that are inherited together.

  • Reduce the number of SNPs required to examine the entire genome for association with a phenotype.

9
New cards

Methods other than measure SNPs

  • Genotyping: targets known SNPs at specific sites (e.g., SNP arrays), cheaper, limited to pre-selected variants.

  • DNA sequencing: reads entire DNA, finds known & novel SNPs (via variant calling → VCF), more data, more expensive.

10
New cards

How to do SNP selection (genotyping)

  • Identify DNA regions of interest.

  • Identify patterns of SNPs that are inherited together on a chromosome.

    • HapMap project: tested the association of millions of SNPs across multiple populations → produced the SNP panels that are used today.

  • Select tag SNPs that can represent a block of associated SNPs.

11
New cards

SNP genotype models

  • SNP genotypes can be coded differently based on genetic models.

    • Additive model (ADD) → commonly used.

    • Dominant model (DOM).

    • Recessive model (REC).

  • SNP genotypes are commonly coded as 0, 1, or 2 to represent the number of copies of the minor allele.

    • 0 → homozygous dominant (e.g., AA).

    • 1 → heterozygous (e.g., Aa).

    • 2 → homozygous recessive (e.g., aa).

<ul><li><p>SNP genotypes can be coded differently based on genetic models.</p><ul><li><p><strong>Additive model</strong>&nbsp;<strong>(ADD) → commonly used</strong>.</p></li><li><p>Dominant model (DOM).</p></li><li><p>Recessive model (REC).</p></li></ul></li><li><p>SNP genotypes are commonly coded as 0, 1, or 2 to represent the number of copies of the minor allele.</p><ul><li><p>0 → homozygous dominant (e.g., AA).</p></li><li><p>1 → heterozygous (e.g., Aa).</p></li><li><p>2 → homozygous recessive (e.g., aa).</p></li></ul></li></ul><p></p>
12
New cards

Genotype/haplotype phasing

  • Current technology gives genotypes but not haplotypes.

  • The process of determining which alleles are located on the same chromosome, i.e., the haplotype.

  • Phasing tools: SHAPEIT5, BEAGLE5.5.

13
New cards

GWAS association tests

  • Chi-square / case-control: simple test, no confounders.

    • No confounders = assumes no other factors (e.g., age, ancestry) affect the association between a variant and a trait.

  • Linear regression: continuous traits (height, BMI, blood pressure).

  • Logistic regression: binary traits (disease yes/no).

  • Linear mixed models: account for relatedness among individuals.

<ul><li><p>Chi-square / case-control: simple test, no confounders.</p><ul><li><p><u>No confounders = assumes no other factors (e.g., age, ancestry) affect the association between a variant and a trait.</u></p></li></ul></li><li><p>Linear regression: continuous traits (height, BMI, blood pressure).</p></li><li><p>Logistic regression: binary traits (disease yes/no).</p></li><li><p>Linear mixed models: account for relatedness among individuals.</p></li></ul><p></p>
14
New cards

GWAS example

knowt flashcard image
15
New cards

Odds ratio

  • Measure of effect size.

  • OR = 1, no disease association.

  • OR > 1, allele C increases risk of disease.

  • OR < 1, allele C decreases the risk of disease.

<ul><li><p>Measure of effect size.</p></li><li><p>OR = 1, no disease association.</p></li><li><p>OR &gt; 1, allele C increases risk of disease.</p></li><li><p>OR &lt; 1, allele C decreases the risk of disease.</p></li></ul><p></p>
16
New cards

PLINK

  • A software tool for analyzing genetic data, especially for GWAS, including association testing, quality control, and data management.

  • Mostly shared.

<ul><li><p>A software tool for analyzing genetic data, especially for GWAS, including association testing, quality control, and data management.</p></li><li><p>Mostly shared.</p></li></ul><p></p>
17
New cards

How do we visualize GWAS results

  • Manhattan plot:

    • Bonferroni testing threshold of p < 5 × 10-8.

      • Multiple test correction.

    • Genome-wide significance threshold = red line on the image.

      • The peaks are SNPs that are close to each other = co-inherited.

    • Significance threshold varies depending on:

      • Number of SNPs examined.

      • Number of subjects included.

      • Minor allele frequencies.

      • Etc.

  • LocusZoom: a combination of a Manhattan plot and a genome browser.

<ul><li><p>Manhattan plot:</p><ul><li><p>Bonferroni testing threshold of p &lt; 5 × 10<sup>-8</sup>.</p><ul><li><p>Multiple test correction.</p></li></ul></li><li><p>Genome-wide significance threshold = red line on the image.</p><ul><li><p>The peaks are SNPs that are close to each other = co-inherited.</p></li></ul></li><li><p><u>Significance threshold varies depending on:</u></p><ul><li><p><u>Number of SNPs examined.</u></p></li><li><p><u>Number of subjects included.</u></p></li><li><p><u>Minor allele frequencies.</u></p></li><li><p><u>Etc.</u></p></li></ul></li></ul></li><li><p>LocusZoom: a combination of a Manhattan plot and a genome browser.</p></li></ul><p></p>
18
New cards

Genotype datasets for large scale GWAS

  • Biobanks and large population-based studies with genetic and phenotype data available for research.

    • For USA → ‘All of Us’ initiative or 23andMe.

  • Genotype data are typically restricted due to re-identification risk.

    • Application needed.

19
New cards

Databases for GWAS summary statistics

  • GWAS Catalog.

  • GWAS Atlas.

  • Allow easy access to summary statistics for thousands of traits.

  • Many downstream analyses are built on the summary statistics, rather than the raw genotype itself.

20
New cards

Post-GWAS analysis

  • Functional mapping:

    • Where are they located in the DNA?

    • Pathways enriched by GWAS findings.

    • How may they influence molecular functions leading to disease?

    • Identify the tissues or cell types where these variants are likely to act.

  • Causal variants identification:

    • GWAS findings come up as clusters → highly correlated.

    • Which one is likely causal?

  • Polygenic risk score.

21
New cards

Functional mapping of GWAS findings - Where are they located in the DNA?

  • Chromosome & base pair position.

  • Coding regions: synonymous / nonsynonymous.

  • Non-coding regions (majority): introns, promoters, enhancers, histone marks, DHS sites.

  • Pathway analysis: link nearest genes to biological pathways; test overlap significance with hypergeometric test (e.g., via EnrichR).

<ul><li><p>Chromosome &amp; base pair position.</p></li><li><p>Coding regions: synonymous / nonsynonymous.</p></li><li><p>Non-coding regions (majority): introns, promoters, enhancers, histone marks, DHS sites.</p></li><li><p>Pathway analysis: link nearest genes to biological pathways; test overlap significance with hypergeometric test (e.g., via EnrichR).</p></li></ul><p></p>
22
New cards

Functional mapping of GWAS findings - How may they influence molecular functions leading to disease?

  • eQTL: SNP affecting gene expression.

    • Other QTLs: pQTL, methylation QTL, etc.

  • Link DNA → gene expression → molecular/phenotypic traits → disease risk.

23
New cards

Functional mapping of GWAS findings - Which tissues or cell types these variants are likely ot act on?

  • GWAS SNPs: eQTLs are tissue- and/or cell-specific.

  • Nearest genes of GWAS SNPs.

24
New cards

GTEx project

  • Studies how genetic variation affects gene expression in normal human tissues.

  • Focuses on nearby (cis) eQTLs.

  • eGene: gene whose expression is significantly influenced by ≥1 nearby eQTL.

25
New cards

Functional mapping of GWAS findings - Causal variants identification

  • Fine-mapping is a statistical analysis that identifies the causal variant(s) within a GWAS locus for a disease.

  • CausalDB.

26
New cards

Fine mapping steps

  1. Define locus: pick region of interest from GWAS.

  2. Expand region: include variants in LD with lead SNP.

  3. Gather data: association stats (z-scores/SE), LD info.

  4. Identify causal variants: heuristic LD, penalized regression, Bayesian models.

27
New cards

Heuristic LD approach

  • Pick the lead SNP + group nearby correlated SNPs based on LD thresholds.

  • High LD = inherited together.

28
New cards

Penalized linear regression approach

  • Penalize the regression to avoid overfitting when SNPs are numerous and correlated.

  • Only a few SNPs within a region are causal.

29
New cards

Bayesian models

  • Posterior inclusive probability (PIP): probability a SNP is causal given data/model.

  • Credible set: variants whose cumulative PIP reaches confidence threshold (e.g., 95%).

  • Goal: make credible set as small as possible.

  • Tools: CAVIAR, FINEMAP, SuSiE, PAINTOR.

30
New cards

Heritability

  • Measures how much of the variation in a trait is due to genetic differences.

  • A heritability of 0.7 means that genetic factors explain 70% of the variation in that trait.

  • Can either be:

    • Twin-based:

      • Gold standard for estimating the broad-sense heritability.

    • GWAS-based.

    • SNP-based.

31
New cards

Twin-based heritability

  • Compare similarity of traits in relatives.

  • Closer relatives share more genes: MZ (monozygotic) twins ~100%, DZ (dizygotic) twins ~50%.

  • If MZ similarity ≫ DZ similarity → trait largely genetic.

32
New cards

ACE model for twin based heritability

knowt flashcard image
33
New cards

GWAS-based heritability

  • Definition: proportion of trait variance explained by genotyped SNPs.

  • From genotype data: use GRM (Genetic Relationship Matrix) → tool: GCTA.

    • GRM is based on pairwise genetic similarities between individuals.

  • From GWAS summary stats: regress SNP test stats (χ²) on LD scores → tool: LDSC.

34
New cards

Missing heritability

  • Definition: portion of trait variance not captured by GWAS.

  • Due to: rare variants (mendelian variants), structural variants, small-effect common SNPs (polygenic risk scores).

35
New cards

Genome-wide Complex Trait Analysis (GCTA)

  • Estimates heritability explained by all common SNPs simultaneously.

  • Finds that common SNPs collectively explain a substantial portion of trait heritability.

  • Explains more than GWAS hits but less than twin-based estimates.

36
New cards

Polygenic risk score

  • Definition: summarizes overall genetic risk for a trait/disease in a single value.

  • Combines effects of many variants; each contributes a small amount.

  • Metrics: association (p-value), variance explained (R²), effect size (β/OR), discrimination (AUC).

  • Tools: LDPred, PRSice.

<ul><li><p>Definition: summarizes overall genetic risk for a trait/disease in a single value.</p></li><li><p>Combines effects of many variants; each contributes a small amount.</p></li><li><p>Metrics: association (p-value), variance explained (R²), effect size (β/OR), discrimination (AUC).</p></li><li><p>Tools: LDPred, PRSice.</p></li></ul><p></p>
37
New cards

Fundamental steps of a PRS analysis

  • Base & target QC: check GWAS heritability, different populations, allele alignment, sample size, genome build.

  • PRS calculation: control for LD, adjust for inflated GWAS effect sizes.

  • PRS testing & validation.

<ul><li><p>Base &amp; target QC: check GWAS heritability, different populations, allele alignment, sample size, genome build.</p></li><li><p>PRS calculation: control for LD, adjust for inflated GWAS effect sizes.</p></li><li><p>PRS testing &amp; validation.</p></li></ul><p></p>
38
New cards

Interpreting PSR

  • Can convert PRS to z-scores or percentiles for comparison.

  • Top decile → higher disease risk vs. bottom decile.

  • Probabilistic: high PRS ≠ guaranteed disease, low PRS ≠ protection.

39
New cards

PSR applications

  • Applications: patient stratification (grouping patients based on risk), prevention, trials.

  • Limits: partial heritability, ancestry bias, ignores interactions (gene-gene interactions), GWAS-dependent (power depends on GWAS sample size and quality).

Explore top flashcards