SNP, GWAS, and post-GWAS

0.0(0)

Studied by 0 people

Call Kai

Learn

Practice Test

Spaced Repetition

Match

Flashcards

Knowt Play

Card Sorting

1/38

There's no tags or description

Looks like no tags are added yet.

Last updated 7:11 PM on 12/4/25

Name	Mastery	Learn	Test	Matching	Spaced	Call with Kai

No analytics yet

Send a link to your students to track their progress

39 Terms

New cards

Genome wide association studies (GWAS)

Identify associations between genetic variations (loci) and traits (including diseases).
- Test for differences in the frequency of genetic variants between individuals who are ancestrally similar but differ phenotypically.
Genetic variations:
- Single nucleotide polymorphism (SNP) → most common.
- Copy number variants.
- Large sequence variations.

New cards

Sinlge nucleotide polymorphism (SNP)

Most common genetic variation (~90% of human variation).
Single base pair change (substitution, insertion, or deletion).
Location:
- Coding: may change protein sequence.
- Non-coding (majority): may affect gene expression, timing, or location.

New cards

SNP database

NCBI Short Genetic Variation database (dbSNP) catalogs short variations in nucleotide sequences for humans.

New cards

Major and minor allele

Major allele is the most common variant found in a population, while the minor allele is the less common or rarer variant at that same position.
- Minor allele frequency (MAF).
- MAF > 1% → common SNP.
- MAF < 1% → rare SNP.
Major and minor alleles are population-specific.
Focus often on the minor allele because minor alleles are crucial for identifying disease risks and studying genetic selection.

New cards

Synonymous vs. nonsynonymous SNPs

Synonymous SNPs:
- Do not change the amino acid sequence of a protein.
- Silent change.
Nonsynonymous SNPs:
- Potentially alter protein structure and function.
This distinction arises because the genetic code is redundant, with multiple codons sometimes coding for the same amino acid.

New cards

Missense vs. nonsense SNPs

Missense and nonsense SNPs are both nonsynonymous mutations.
- Change the amino acid sequence.
Missense mutation:
- Results in a different amino acid being incorporated into the protein.
- It will alter the protein’s structure and function.
Nonsense mutation:
- This changes from a sense codon to a premature stop codon.
- This leads to a truncated and often non-functional protein.

New cards

Linkage disequilibirum (LD) blocks

LD = non-random association of genes.
- There is a pattern.
Sets of nearby SNPs on the same chromosome are inherited together in blocks.
Haplotype/LD block:
- A group of alleles that are co-inherited as a single block.
- LDTools: LDHap, LDMatrix, etc.

New cards

Tag SNPs

A few SNPs are enough to identify the haplotypes in a block uniquely.
- Haplotype: a group of alleles/variants on the same chromosome that are inherited together.
Reduce the number of SNPs required to examine the entire genome for association with a phenotype.

New cards

Methods other than measure SNPs

Genotyping: targets known SNPs at specific sites (e.g., SNP arrays), cheaper, limited to pre-selected variants.
DNA sequencing: reads entire DNA, finds known & novel SNPs (via variant calling → VCF), more data, more expensive.

New cards

How to do SNP selection (genotyping)

Identify DNA regions of interest.
Identify patterns of SNPs that are inherited together on a chromosome.
- HapMap project: tested the association of millions of SNPs across multiple populations → produced the SNP panels that are used today.
Select tag SNPs that can represent a block of associated SNPs.

New cards

SNP genotype models

SNP genotypes can be coded differently based on genetic models.
- Additive model (ADD) → commonly used.
- Dominant model (DOM).
- Recessive model (REC).
SNP genotypes are commonly coded as 0, 1, or 2 to represent the number of copies of the minor allele.
- 0 → homozygous dominant (e.g., AA).
- 1 → heterozygous (e.g., Aa).
- 2 → homozygous recessive (e.g., aa).

<ul><li><p>SNP genotypes can be coded differently based on genetic models.</p><ul><li><p><strong>Additive model</strong> <strong>(ADD) → commonly used</strong>.</p></li><li><p>Dominant model (DOM).</p></li><li><p>Recessive model (REC).</p></li></ul></li><li><p>SNP genotypes are commonly coded as 0, 1, or 2 to represent the number of copies of the minor allele.</p><ul><li><p>0 → homozygous dominant (e.g., AA).</p></li><li><p>1 → heterozygous (e.g., Aa).</p></li><li><p>2 → homozygous recessive (e.g., aa).</p></li></ul></li></ul><p></p>

New cards

Genotype/haplotype phasing

Current technology gives genotypes but not haplotypes.
The process of determining which alleles are located on the same chromosome, i.e., the haplotype.
Phasing tools: SHAPEIT5, BEAGLE5.5.

New cards

GWAS association tests

Chi-square / case-control: simple test, no confounders.
- No confounders = assumes no other factors (e.g., age, ancestry) affect the association between a variant and a trait.
Linear regression: continuous traits (height, BMI, blood pressure).
Logistic regression: binary traits (disease yes/no).
Linear mixed models: account for relatedness among individuals.

<ul><li><p>Chi-square / case-control: simple test, no confounders.</p><ul><li><p><u>No confounders = assumes no other factors (e.g., age, ancestry) affect the association between a variant and a trait.</u></p></li></ul></li><li><p>Linear regression: continuous traits (height, BMI, blood pressure).</p></li><li><p>Logistic regression: binary traits (disease yes/no).</p></li><li><p>Linear mixed models: account for relatedness among individuals.</p></li></ul><p></p>

New cards

GWAS example

New cards

Odds ratio

Measure of effect size.
OR = 1, no disease association.
OR > 1, allele C increases risk of disease.
OR < 1, allele C decreases the risk of disease.

<ul><li><p>Measure of effect size.</p></li><li><p>OR = 1, no disease association.</p></li><li><p>OR > 1, allele C increases risk of disease.</p></li><li><p>OR < 1, allele C decreases the risk of disease.</p></li></ul><p></p>

New cards

PLINK

A software tool for analyzing genetic data, especially for GWAS, including association testing, quality control, and data management.
Mostly shared.

<ul><li><p>A software tool for analyzing genetic data, especially for GWAS, including association testing, quality control, and data management.</p></li><li><p>Mostly shared.</p></li></ul><p></p>

New cards

How do we visualize GWAS results

Manhattan plot:
- Bonferroni testing threshold of p < 5 × 10^-8.
  - Multiple test correction.
- Genome-wide significance threshold = red line on the image.
  - The peaks are SNPs that are close to each other = co-inherited.
- Significance threshold varies depending on:
  - Number of SNPs examined.
  - Number of subjects included.
  - Minor allele frequencies.
  - Etc.
LocusZoom: a combination of a Manhattan plot and a genome browser.

<ul><li><p>Manhattan plot:</p><ul><li><p>Bonferroni testing threshold of p < 5 × 10<sup>-8</sup>.</p><ul><li><p>Multiple test correction.</p></li></ul></li><li><p>Genome-wide significance threshold = red line on the image.</p><ul><li><p>The peaks are SNPs that are close to each other = co-inherited.</p></li></ul></li><li><p><u>Significance threshold varies depending on:</u></p><ul><li><p><u>Number of SNPs examined.</u></p></li><li><p><u>Number of subjects included.</u></p></li><li><p><u>Minor allele frequencies.</u></p></li><li><p><u>Etc.</u></p></li></ul></li></ul></li><li><p>LocusZoom: a combination of a Manhattan plot and a genome browser.</p></li></ul><p></p>

New cards

Genotype datasets for large scale GWAS

Biobanks and large population-based studies with genetic and phenotype data available for research.
- For USA → ‘All of Us’ initiative or 23andMe.
Genotype data are typically restricted due to re-identification risk.
- Application needed.

New cards

Databases for GWAS summary statistics

GWAS Catalog.
GWAS Atlas.
Allow easy access to summary statistics for thousands of traits.
Many downstream analyses are built on the summary statistics, rather than the raw genotype itself.

New cards

Post-GWAS analysis

Functional mapping:
- Where are they located in the DNA?
- Pathways enriched by GWAS findings.
- How may they influence molecular functions leading to disease?
- Identify the tissues or cell types where these variants are likely to act.
Causal variants identification:
- GWAS findings come up as clusters → highly correlated.
- Which one is likely causal?
Polygenic risk score.

New cards

Functional mapping of GWAS findings - Where are they located in the DNA?

Chromosome & base pair position.
Coding regions: synonymous / nonsynonymous.
Non-coding regions (majority): introns, promoters, enhancers, histone marks, DHS sites.
Pathway analysis: link nearest genes to biological pathways; test overlap significance with hypergeometric test (e.g., via EnrichR).

<ul><li><p>Chromosome & base pair position.</p></li><li><p>Coding regions: synonymous / nonsynonymous.</p></li><li><p>Non-coding regions (majority): introns, promoters, enhancers, histone marks, DHS sites.</p></li><li><p>Pathway analysis: link nearest genes to biological pathways; test overlap significance with hypergeometric test (e.g., via EnrichR).</p></li></ul><p></p>

New cards

Functional mapping of GWAS findings - How may they influence molecular functions leading to disease?

eQTL: SNP affecting gene expression.
- Other QTLs: pQTL, methylation QTL, etc.
Link DNA → gene expression → molecular/phenotypic traits → disease risk.

New cards

Functional mapping of GWAS findings - Which tissues or cell types these variants are likely ot act on?

GWAS SNPs: eQTLs are tissue- and/or cell-specific.
Nearest genes of GWAS SNPs.

New cards

GTEx project

Studies how genetic variation affects gene expression in normal human tissues.
Focuses on nearby (cis) eQTLs.
eGene: gene whose expression is significantly influenced by ≥1 nearby eQTL.

New cards

Functional mapping of GWAS findings - Causal variants identification

Fine-mapping is a statistical analysis that identifies the causal variant(s) within a GWAS locus for a disease.
CausalDB.

New cards

Fine mapping steps

Define locus: pick region of interest from GWAS.
Expand region: include variants in LD with lead SNP.
Gather data: association stats (z-scores/SE), LD info.
Identify causal variants: heuristic LD, penalized regression, Bayesian models.

New cards

Heuristic LD approach

Pick the lead SNP + group nearby correlated SNPs based on LD thresholds.
High LD = inherited together.

New cards

Penalized linear regression approach

Penalize the regression to avoid overfitting when SNPs are numerous and correlated.
Only a few SNPs within a region are causal.

New cards

Bayesian models

Posterior inclusive probability (PIP): probability a SNP is causal given data/model.

Credible set: variants whose cumulative PIP reaches confidence threshold (e.g., 95%).
Goal: make credible set as small as possible.
Tools: CAVIAR, FINEMAP, SuSiE, PAINTOR.

New cards

Heritability

Measures how much of the variation in a trait is due to genetic differences.
A heritability of 0.7 means that genetic factors explain 70% of the variation in that trait.
Can either be:
- Twin-based:
  - Gold standard for estimating the broad-sense heritability.
- GWAS-based.
- SNP-based.

New cards

Twin-based heritability

Compare similarity of traits in relatives.
Closer relatives share more genes: MZ (monozygotic) twins ~100%, DZ (dizygotic) twins ~50%.
If MZ similarity ≫ DZ similarity → trait largely genetic.

New cards

ACE model for twin based heritability

New cards

GWAS-based heritability

Definition: proportion of trait variance explained by genotyped SNPs.
From genotype data: use GRM (Genetic Relationship Matrix) → tool: GCTA.
- GRM is based on pairwise genetic similarities between individuals.
From GWAS summary stats: regress SNP test stats (χ²) on LD scores → tool: LDSC.

New cards

Missing heritability

Definition: portion of trait variance not captured by GWAS.
Due to: rare variants (mendelian variants), structural variants, small-effect common SNPs (polygenic risk scores).

New cards

Genome-wide Complex Trait Analysis (GCTA)

Estimates heritability explained by all common SNPs simultaneously.
Finds that common SNPs collectively explain a substantial portion of trait heritability.
Explains more than GWAS hits but less than twin-based estimates.

New cards

Polygenic risk score

Definition: summarizes overall genetic risk for a trait/disease in a single value.
Combines effects of many variants; each contributes a small amount.
Metrics: association (p-value), variance explained (R²), effect size (β/OR), discrimination (AUC).
Tools: LDPred, PRSice.

<ul><li><p>Definition: summarizes overall genetic risk for a trait/disease in a single value.</p></li><li><p>Combines effects of many variants; each contributes a small amount.</p></li><li><p>Metrics: association (p-value), variance explained (R²), effect size (β/OR), discrimination (AUC).</p></li><li><p>Tools: LDPred, PRSice.</p></li></ul><p></p>

New cards

Fundamental steps of a PRS analysis

Base & target QC: check GWAS heritability, different populations, allele alignment, sample size, genome build.
PRS calculation: control for LD, adjust for inflated GWAS effect sizes.
PRS testing & validation.

<ul><li><p>Base & target QC: check GWAS heritability, different populations, allele alignment, sample size, genome build.</p></li><li><p>PRS calculation: control for LD, adjust for inflated GWAS effect sizes.</p></li><li><p>PRS testing & validation.</p></li></ul><p></p>

New cards

Interpreting PSR

Can convert PRS to z-scores or percentiles for comparison.
Top decile → higher disease risk vs. bottom decile.
Probabilistic: high PRS ≠ guaranteed disease, low PRS ≠ protection.

New cards

PSR applications

Applications: patient stratification (grouping patients based on risk), prevention, trials.
Limits: partial heritability, ancestry bias, ignores interactions (gene-gene interactions), GWAS-dependent (power depends on GWAS sample size and quality).