Association Analysis

Learning Objectives

  • Define the term “Genetic Association”

  • Describe how a genetic association study is conducted

  • Describe an ideal genetic marker and how SNPs fit this description

  • Describe the principles of a genome-wide association study (GWAS)

  • Describe and interpret a Manhattan plot

  • Describe and interpret a regional association plot

  • Describe meta-analysis and how it is used in genetic studies

  • Describe the known problems with GWAS

  • Give an example of how GWAS has identified susceptibility variants in genes in a common disease

  • Describe the relationship between genetic association and linkage disequilibrium

Genetic Association

Definition

Genetic association refers to the presence of a particular allele or genetic variant at a higher frequency in individuals with a specific trait or disease (cases) compared to individuals without the trait or disease (controls).


Conducting a Genetic Association Study

  1. Define the Study Population:

    • Identify cases (disease or trait of interest).

    • Select well-matched controls (same ethnicity, age, sex).

  2. Genotype Individuals:

    • Use technologies like SNP arrays to genotype individuals across the genome.

  3. Statistical Analysis:

    • Test the frequency of each allele in cases versus controls using statistical methods such as the chi-squared test.

  4. Replication:

    • Replicate findings in independent cohorts to ensure robustness.


Ideal Genetic Marker

An ideal genetic marker should:

  • Be polymorphic (have multiple alleles).

  • Be frequent in the genome and population.

  • Be stably inherited without mutation or recombination.

  • Be easy to genotype.

Single Nucleotide Polymorphisms (SNPs) fit these criteria:

  • Occur every ~300 base pairs in the genome.

  • Common in populations.

  • Easy to assay using modern technologies like microarrays.


Principles of Genome-Wide Association Studies (GWAS)

GWAS identifies genetic variants associated with diseases or traits by examining SNPs across the genome.

  1. High-Throughput Genotyping:

    • Use SNP arrays to analyze thousands or millions of SNPs.

  2. Case-Control Design:

    • Compare allele frequencies between cases (with the disease) and controls (without the disease).

  3. Statistical Analysis:

    • Test each SNP for association using statistical models.

    • Apply a stringent threshold for genome-wide significance (p < 5 × 10⁻⁸).


Manhattan Plot

Description
  • X-Axis: Genomic position (chromosomes).

  • Y-Axis: -log10(p-value) of association tests.

Interpretation
  • Each point represents an SNP.

  • Peaks: SNPs strongly associated with the disease/trait.

  • Significance Threshold: Red line indicates genome-wide significance (e.g., p = 5 × 10⁻⁸).


Regional Association Plot

Description
  • Zoomed-in view of a region with a significant SNP from a Manhattan plot.

  • Includes SNPs in linkage disequilibrium (LD) with the lead SNP.

Interpretation
  • Lead SNP (most significant) is highlighted.

  • Nearby SNPs (correlated due to LD) may also show association.

  • Helps identify the causal gene or variant within the region.


Meta-Analysis in Genetic Studies

Definition

A meta-analysis combines data from multiple studies to increase statistical power and reliability.

Process
  1. Pool summary statistics from individual studies.

  2. Apply statistical models to combine results.

  3. Generate an overall estimate of association.

Use in Genetics
  • Combines GWAS results from different cohorts.

  • Increases sample size, improving power to detect associations.


Problems with GWAS

  1. Missing Heritability:

    • GWAS explains only a small fraction (<5%) of genetic variance for many traits.

  2. Common vs. Rare Variants:

    • GWAS focuses on common SNPs; rare variants may have larger effects but are missed.

  3. Functional Interpretation:

    • Significant SNPs often lie in non-coding regions; linking them to causal genes is challenging.

  4. Population Stratification:

    • Differences in allele frequencies due to ancestry can confound results.

  5. Replication Issues:

    • Findings must be validated in independent cohorts to ensure reproducibility.


Example of GWAS Success

Obesity and the FTO Gene
  • FTO SNPs identified as strongly associated with Body Mass Index (BMI).

  • Individuals with risk alleles have higher susceptibility to obesity.

  • GWAS identified FTO as the first common variant significantly associated with obesity.


Genetic Association and Linkage Disequilibrium (LD)

Relationship
  • LD: Non-random association of alleles at nearby loci due to co-inheritance.

  • GWAS leverages LD to detect associations:

    • A significant SNP may tag a causal variant nearby.

Implication
  • Identifying associated SNPs helps narrow down genomic regions.

  • Further fine-mapping and functional studies are needed to pinpoint causal variants.


Would you like a deeper dive into any specific component?