Week 3 Lecture 1 -Mendelian Inheritance Continued - a goodness of fit test for numerical results and human pedigree analysis
Introduction to Statistics and Learning
The lecturer shares a personal anecdote about failing their Maths AS level but states that they will teach basic statistics. This personal touch emphasizes that statistics, even complex ones, can be mastered and are crucial for interpreting genetic data objectively.
Understanding foundational statistics in genetics is important for analyzing experimental results, testing hypotheses, and drawing valid conclusions about inheritance patterns, moving beyond mere observation to quantitative analysis.
Learning Outcomes
The session is divided into segments to enable learning:
Chi-squared to test a hypothesis (Instructor-led): 20 Minutes
Chi-squared to test a hypothesis (You doing): 40 Minutes
Break: 10 Minutes
Mendelian inheritance in humans: 10 Minutes
Pedigree analysis of human disease: 20 Minutes
Questions, questions: 15 Minutes
Mendel’s Data from Monohybrid Crosses
Detailed results from various crosses, showcasing consistent ratios that led to Mendel's laws of inheritance. These experiments involved carefully controlled breeding of pea plants, focusing on single traits at a time.
Tall × dwarf stem:
P Cross: All tall (indicating that tallness is dominant)
F2 Generation: 787 tall, 277 dwarf (Ratio: 2.84:1, closely approximating a 3:1 dominant to recessive ratio)
Round × wrinkled seeds:
F2 Generation: 5,474 round, 1,850 wrinkled (Ratio: 2.96:1)
Yellow × green seeds:
F2 Generation: 6,022 yellow, 2,001 green (Ratio: 3.01:1)
Purple × white flowers:
F2 Generation: 705 purple, 224 white (Ratio: 3.15:1)
Axial × terminal flowers:
F2 Generation: 651 axial, 207 terminal (Ratio: 3.14:1)
Smooth × constricted pods:
F2 Generation: 882 smooth, 229 constricted (Ratio: 2.95:1)
Green × yellow pods:
F2 Generation: 428 green, 152 yellow (Ratio: 2.82:1)
The consistent approximate 3:1 ratio observed in the F2 generation across these varied monohybrid crosses was a cornerstone for Mendel's theory of particulate inheritance, demonstrating the existence of dominant and recessive alleles.
The Chi-Square Test
Definition: A statistical method used to determine the goodness of fit between observed categorical data and the predictions of a hypothesis. In genetic terms, it assesses how well experimental results align with expected Mendelian ratios (e.g., 3:1 or 9:3:3:1).
Key Note: The chi-square test evaluates if there is a significant difference between the observed data and the expected data by testing a null hypothesis (e.g., 'there is no significant difference between observed and expected ratios'). It does not prove that a hypothesis is correct, but rather indicates whether it should be rejected or if the observed deviation from the expectation could simply be due to random chance.
Formula:
Where:
$O$ = Observed data in each category
$E$ = Expected data in each category based on the hypothesis
$\sum$ = Summation for all categories
Example Case: Drosophila melanogaster
Characters studied: This organism, commonly known as the fruit fly, is a widely used model organism in genetics due to its short generation time, large number of offspring, and easily observable phenotypic variations.
Gene affecting wing shape and body color
Wild-type allele indicated by a +; recessive mutant alleles are lowercase (e.g., body color: gray (+) vs. ebony (e); wing shape: straight (+) vs. curved (c)).
Dihybrid Cross Example in Drosophila melanogaster
F2 Generation Observed Data: This data is typically obtained from crossing two F1 individuals (which are heterozygous for both traits) resulting from a P cross between two true-breeding parents differing in two traits.
193 straight wings, gray bodies
69 straight wings, ebony bodies
64 curved wings, gray bodies
26 curved wings, ebony bodies
Total: 352 flies
Is this a 9:3:3:1 ratio?
To analyze, the proposed hypothesis (the null hypothesis) is that the traits assort independently according to Mendel's Law of Independent Assortment.
This hypothesis facilitates calculating expected progeny numbers following a 9:3:3:1 ratio corresponding to the total number of observed offspring ($352$). This ratio arises when two unlinked genes independently segregate their alleles into gametes.
Expected Values Calculation
Expected Numbers Based on Hypothesis for Each Phenotype: These values are derived by applying the theoretical Mendelian 9:3:3:1 ratio to the total observed count.
Straight wings, gray bodies (doubly dominant phenotype):
Straight wings, ebony bodies (dominant for wings, recessive for body):
Curved wings, gray bodies (recessive for wings, dominant for body):
Curved wings, ebony bodies (doubly recessive phenotype):
Application of the Chi-Square Formula
To derive the chi-square statistic for the observed against expected values, each phenotypic category's deviation is squared, divided by its expected value, and then summed up:
Calculation:
Interpreting Chi-Square Value
Probabilities and Degrees of Freedom:
Degrees of Freedom (df): where $n$ is total number of categories; in this example, $df = 4 - 1 = 3$. Degrees of freedom represent the number of independent variables or categories that can vary in a statistical calculation.
The chi-square value $1.06$ is checked against chi-square probability tables for determination of significance. This table helps to determine the probability (P-value) that the observed deviation from the expected values occurred purely by random chance.
With df = 3, searching a standard chi-square table for $1.06$ yields a P-value (or approximate P-value).
Given and . This indicates a high likelihood (approximately 80% chance) that the deviations between the observed and expected data are attributable to random chance rather than a genuine deviation from the proposed 9:3:3:1 ratio.
A common significance level (alpha) is P < 0.05 . Since the calculated P-value ($0.80$) is much greater than $0.05$, we fail to reject the null hypothesis. This means there is no statistically significant evidence to suggest that the observed data deviates from the expected 9:3:3:1 ratio, thus supporting the hypothesis of independent assortment for these two genes.
Analyzing Chi-Square Values for Other Crosses
Table 2.1 presents chi-square critical values for various degrees of freedom, illustrating thresholds for P-values (e.g., for , ) indicating whether to reject the null hypothesis or not. Values calculated above a critical value for a given P-value suggest rejecting the null hypothesis.
Further Applications and Examples
Additional exercises are suggested for practice in calculating chi-squared values using provided data sets, reinforcing how to validate Mendelian ratios and understand the statistical implications of experimental results.
Pedigree Analysis in Humans
Ethical considerations prevent controlled crosses in human genetics (e.g., it is impossible and unethical to intentionally breed humans for genetic study); thus, pedigree analysis, which studies the inheritance of traits within families through existing records, is paramount for tracking inheritance patterns.
Pedigrees visualize the inheritance of traits across multiple generations within a family:
Affected individuals indicated within the family structure are assessed to deduce whether traits are dominant, recessive, X-linked, or autosomal by observing their pattern of appearance.
Common symbols used:
Squares for males, circles for females
Shading indicates affected status, while unshaded symbols represent unaffected individuals.
Half-shaded or dot-filled symbols can indicate carriers of recessive traits.
A diagonal line through a symbol often denotes a deceased individual.
Horizontal lines connect parents, and vertical lines extend to offspring.
Roman numerals typically indicate generations (e.g., I, II, III), and Arabic numerals label individuals within each generation.
For a dominant trait, affected individuals usually appear in every generation, with affected offspring having at least one affected parent. For a recessive trait, affected individuals can 'skip' generations, and unaffected parents can have affected offspring (if both are carriers).
Specific Genetic Traits in Humanoids
Sickle Cell Anemia:
A well-known autosomal recessive disorder caused by a single point mutation (glu6val) in the beta-globin gene, leading to the production of abnormal hemoglobin (HbS). This causes red blood cells to become rigid and sickle-shaped under low oxygen conditions, impacting oxygen transport and leading to chronic anemia, pain crises, and organ damage.
This condition showcases the substantial impact of genetic mutations on health, particularly in individuals from regions with high malaria prevalence, where heterozygous carriers of the sickle cell trait possess some protective advantage against malaria due to their modified red blood cell structure, illustrating heterozygote advantage.
Cystic Fibrosis:
A common autosomal recessive disorder, primarily affecting individuals of Northern European descent. It is characterized by a mutation in the CFTR (Cystic Fibrosis Transmembrane conductance Regulator) gene, which normally produces a channel protein involved in chloride ion transport across cell membranes.
The mutation leads to defective chloride channels, resulting in the production of abnormally thick, sticky mucus that clogs ducts in vital organs, primarily affecting the respiratory (leading to chronic lung infections), digestive (pancreatic insufficiency), and reproductive systems.
Closing Statements on Pedigree Analysis
The importance of understanding genotype and phenotype relationships through pedigree analysis solidly informs predictions about genetic disorders, familial health tracking, and offspring risk assessment, forming the basis of genetic counseling.
Learning Outcome Recap: Chi-squared testing provides a statistical framework for evaluating experimental genetic crosses, while pedigree analysis remains an integral methodology for analyzing Mendelian inheritance patterns in human traits where experimental crosses are not possible.