Hardy-Weinberg Equilibrium: Study Notes
Hardy-Weinberg Equilibrium: Concept and Implications
- Core idea: In a large population with random mating and no evolutionary forces, allele frequencies stay constant across generations.
- If there are two alleles at a locus, say A and a, and their frequencies are p and q respectively, then under random mating the genotype frequencies in the next generation follow:
- P(AA) = p^2\,,\quad P(Aa) = 2pq\,,\quad P(aa) = q^2
- The sum of allele frequencies is 1: p + q = 1\,, so q = 1 - p\$.
- If you mix alleles at random (random union of gametes), you would expect those genotype proportions, and that scenario is described by Hardy-Weinberg equilibrium.
- When observed genotype frequencies match these expected frequencies, the population is said to be in Hardy-Weinberg equilibrium; if they differ, evolution or other factors may be at play.
- The transcript’s framing: the idea of "no selection" or no evolution leads to random sampling of alleles and the resulting genotype distribution following the p^2, 2pq, q^2 pattern.
Assumptions of the Hardy-Weinberg Model
- Very large population size (to avoid genetic drift).
- Random mating (no mating bias or sexual selection).
- No mutation introducing new alleles.
- No migration (no gene flow between populations).
- No natural selection (genotypes have equal viability and fertility).
- Usually non-overlapping generations (for simple modeling), though the core results can apply more broadly.
Allele and Genotype Frequencies
- Define:
- p = \text{frequency of the A allele}
- q = \text{frequency of the a allele}
- Relationship:
- p + q = 1\,, hence q = 1 - p\,.
- Under random mating, genotype frequencies are:
- P(AA) = p^2\,
- P(Aa) = 2pq\,
- P(aa) = q^2\,
- Example with numbers: if p = 0.6, q = 0.4 then
- P(AA) = 0.36\,
- P(Aa) = 0.48\,
- P(aa) = 0.16\,
Derivation and Intuition
- From the allele frequencies in gametes, the formation of zygotes is like drawing two alleles independently:
- Probability of AA = p\times p = p^2
- Probability of Aa = p\times q + q\times p = 2pq
- Probability of aa = q\times q = q^2
- This is equivalent to expanding the binomial model of two alleles: (p + q)^2 = p^2 + 2pq + q^2 , and using p + q = 1.
- The key takeaway: genotype frequencies in equilibrium are determined solely by current allele frequencies; they do not depend on initial genotype frequencies, provided assumptions hold.
Testing for Hardy-Weinberg Equilibrium
- You can test whether a population is in HWE by comparing observed genotype counts to expected counts under HWE.
- Steps:
- Collect observed counts: O{AA}, O{Aa}, O{aa} for a sample size N = O{AA} + O{Aa} + O{aa}.
- Estimate allele frequencies from data (from observed counts):
- \hat{p} = \frac{2\,O{AA} + O{Aa}}{2N}
- \hat{q} = 1 - \hat{p}
- Compute expected counts under HWE:
- E_{AA} = N \hat{p}^2\,
- E_{Aa} = N \cdot 2\hat{p}\hat{q}\,
- E_{aa} = N \hat{q}^2\,
- Use a chi-squared test to compare observed vs expected:
- \chi^2 = \sum{i \in {AA,Aa,aa}} \frac{(Oi - Ei)^2}{Ei}
- Degrees of freedom (df): typically 1 for a biallelic locus when p is estimated from data; df = 2 if p and q are fixed a priori.
- Decision: compare \chi^2 to the critical value at your chosen significance level (e.g., 0.05). If \chi^2 is larger, reject HWE; if not, fail to reject HWE.
- Important caveats:
- Deviations can indicate selection, drift, mutation, migration, nonrandom mating, or sampling/genotyping errors.
- Small sample sizes or population structure (Wahlund effect) can masquerade as deviation from HWE.
Worked Example and Interpretation (Link to Transcript)
- Concept: compare observed vs expected frequencies to determine if evolutionary forces are acting.
- If observed frequencies match the HWE expectations, the population is in Hardy-Weinberg equilibrium for that locus (no evolution affecting that locus under the model).
- The transcript describes this as: “these two numbers are the same, so this is Hardy-Weinberg.”
- Example scaffold (numbers can vary):
- Suppose observed counts: O{AA} = 36,\; O{Aa} = 48,\; O_{aa} = 16 in a sample of N = 100 individuals.
- Estimate allele frequencies: \hat{p} = \frac{2\cdot 36 + 48}{200} = \frac{144}{200} = 0.72\,, which would imply \hat{q} = 0.28\,.
- Expected counts: E{AA} = 100 \cdot 0.72^2,\; E{Aa} = 100 \cdot 2 \cdot 0.72 \cdot 0.28,\; E_{aa} = 100 \cdot 0.28^2.
- Compare with observed, compute \chi^2, and decide if in HWE.
- Practical note: In many classroom or real-world datasets, you first estimate p from the data, then compute expected counts, then apply the chi-square test as above.
Connections to Foundations and Real-World Relevance
- Relationship to Mendelian genetics: HWE provides a bridge from allele-level frequencies to genotype-level expectations in populations.
- Foundational principle: Under no-evolutionary-forces assumptions, allele frequencies do not change across generations, and genotype frequencies stabilize as a function of p and q.
- Real-world relevance:
- Used in population genetics, evolutionary biology, and forensic genetics.
- In human genetics, HWE checks are a standard quality-control step in genotyping data; deviations can flag errors or population structure.
- Helps infer whether part of the population is experiencing selection, inbreeding, or substructure.
Implications and Practical Considerations
- If deviations from HWE are detected, consider possible causes:
- Natural selection: some genotypes confer higher viability/fertility.
- Nonrandom mating: assortative mating or inbreeding increases homozygosity.
- Population structure/genetic subdivision: Wahlund effect can reduce heterozygosity.
- Mutation or migration introducing new alleles.
- Genotyping errors or sampling biases.
- Ethical/philosophical/real-world angle:
- In human studies, deviations can reflect social or historical population structures; careful interpretation is needed to avoid misattributing social factors to biology.
- The model assumes idealized conditions; real populations often violate one or more assumptions, so HWE serves as a diagnostic baseline rather than a universal law.
- Allele frequencies and sum rule:
- Genotype frequencies under random mating:
- P(AA) = p^2\, , \quad P(Aa) = 2pq\, , \quad P(aa) = q^2\,
- Observed and estimated allele frequency from data:
- \hat{p} = \dfrac{2\,O{AA} + O{Aa}}{2N}\,, \quad \hat{q} = 1 - \hat{p}\,
- Expected genotype counts under HWE:
- E{AA} = N \hat{p}^2\, , \quad E{Aa} = N \cdot 2\hat{p}\hat{q}\, , \quad E_{aa} = N \hat{q}^2\,
- Chi-squared test statistic:
- \chi^2 = \dfrac{(O{AA} - E{AA})^2}{E{AA}} + \dfrac{(O{Aa} - E{Aa})^2}{E{Aa}} + \dfrac{(O{aa} - E{aa})^2}{E_{aa}}\,
- Degrees of freedom: typically df = 1 for a single biallelic locus with allele frequencies estimated from data.
Summary Takeaway
- Hardy-Weinberg equilibrium provides a baseline expectation for genotype frequencies given allele frequencies, assuming no evolutionary forces.
- The key test involves comparing observed genotype counts to HW-predicted counts using the chi-square test and interpreting deviations in the context of possible evolutionary forces or data issues.
- The fundamental formulas are p+q=1\, ,\quad P(AA)=p^2,\; P(Aa)=2pq,\; P(aa)=q^2\,,$$ with practical calculation of p from observed data and chi-square testing to assess equilibrium.