Inference for Distributions of Categorical Data: Chi-Square Test for Goodness of Fit
Inference for Distributions of Categorical Data
The study of categorical variables involves determining if a hypothesized distribution of data matches observed results in one or more populations.
There are three primary types of chi-square tests used for categorical data, depending on the research question and data structure: - Goodness of Fit (G.O.F.) Test: Used to determine if a hypothesized distribution for a single categorical variable in a single population seems valid (e.g., used frequently in genetic research). - Chi-Square Test for Homogeneity: Used to determine whether the distribution of a single categorical variable differs across two or more populations or treatments. Data is typically organized in a two-way table. - Chi-Square Test for Association/Independence: Used to determine if there is convincing evidence of an association between two categorical variables in a population.
Chi-Square Test for Goodness of Fit (G.O.F.)
Definition: A goodness-of-fit test compares the distribution of a categorical variable in a sample to a claimed or hypothesized distribution in the population.
Stating Hypotheses: - $H_0$ (Null Hypothesis): The distribution of the categorical variable in the population of interest is the same as the claimed distribution. - $H_a$ (Alternative Hypothesis): The distribution of the categorical variable in the population of interest is different from the claimed distribution. - Symbolic Notation: - $H_0: p_1 = \text{value}_1, p_2 = \text{value}_2, \dots, p_k = \text{value}_k$ - $H_a$: At least two of the $p_i$ values are incorrect. - Caution: Do not state $H_a$ in a way that suggests all proportions in the hypothesized distribution are wrong; it only requires that at least two are incorrect to be a valid alternative.
Expected Counts: - The expected count for a specific category is calculated under the assumption that the null hypothesis is true. - Formula: - Where $n$ is the total sample size and $p_i$ is the relative frequency (probability) for category $i$ specified by $H_0$.
The Chi-Square ($\chi^2$) Test Statistic: - This statistic measures how far the observed counts ($O$) in a sample are from the expected counts ($E$). - Formula: - The sum is across all $k$ categories of the variable. - Large values of $\chi^2$ provide stronger evidence against $H_0$.
Chi-Square Distributions and P-Values
Properties of the Chi-Square Distribution: - The distribution is defined by a density curve that takes only non-negative values. - It is skewed to the right. - A specific $\chi^2$ distribution is defined by its degrees of freedom ($df$). - As $df$ increases, the density curve becomes less skewed and begins to look more normal. - The mean of a $\chi^2$ distribution is equal to its $df$. - For $df > 2$, the mode (peak) of the density curve is located at .
Degrees of Freedom ($df$) for G.O.F.: - - Where $k$ is the number of categories.
P-Values: - The P-value is the area to the right of the calculated $\chi^2$ statistic under the $\chi^2$ density curve with the appropriate $df$. - Caution: Failing to reject $H_0$ (when $P$ is large) does not mean the null hypothesis is definitely true; it means we lack convincing evidence that the distribution is different.
Performing a Chi-Square Test (State, Plan, Do, Conclude)
Name of Test: Chi-Squared Goodness of Fit Test.
Conditions for Inference: - Random: The data must come from a well-designed random sample from the population or a randomized experiment. - 10% Rule: When sampling without replacement, the sample size $n$ must be less than $10\%$ of the population size $N$ ($n < 0.10N$). - Large Counts: All expected counts must be at least 5 ($E_i \ge 5$ for all $i$).
AP Exam Tip: When checking the Large Counts condition, you must examine and explicitly label the expected counts, not the observed counts.
Calculator Usage (TI-84): - Input observed counts in List 1 (L1) and expected counts in List 2 (L2). - Select
$\\chi^2$GOF-Testfrom the Stat/Tests menu. - Individual terms in the $\chi^2$ calculation are stored in a list calledCNTRB(contributions). - AP Tip: Write out at least the first few terms of the $\chi^2$ summation manually (e.g., ) to earn partial credit even if a calculation error occurs.
Case Study: M&M'S® Milk Chocolate Candies
Mars, Inc. Claimed Distribution (Hackettstown, NJ factory): - Brown: $12.5\%$ - Red: $12.5\%$ - Yellow: $12.5\%$ - Green: $12.5\%$ - Orange: $25.0\%$ - Blue: $25.0\%$
Jerome’s Sample Analysis: - Sample size: $n = 60$. - Expected counts: - (Brown, Red, Yellow, Green) - (Orange, Blue) - Observed counts: Brown (12), Red (3), Yellow (7), Green (9), Orange (9), Blue (20). - $\chi^2$ calculation: - $df = 6 - 1 = 5$. - P-value results (simulation): $87/1000 = 0.087$. At significance level $\alpha = 0.05$, we would fail to reject $H_0$.
Example 1 & 3: Ceramic Six-Sided Die (Carrie)
Scenario: Carrie rolled a custom 6-sided die 90 times to test for fairness.
Hypotheses: - $H_0$: The sides of Carrie’s die are equally likely to show up ($p_1 = p_2 = p_3 = p_4 = p_5 = p_6 = 1/6$). - $H_a$: The sides of Carrie’s die are not equally likely to show up.
Observed Data: 1 (12), 2 (28), 3 (12), 4 (13), 5 (10), 6 (15).
Calculations: - Expected count for each side: . - $\chi^2$ value: . - $df = 6 - 1 = 5$. - P-value (using technology): .
Conclusion: Since the P-value ($0.0132$) is less than $\alpha = 0.05$, reject $H_0$. There is convincing evidence the die is not fair.
Example 4: Birthday Distributions of NHL Players (Malcolm Gladwell)
Topic: Discussion of whether a hockey player’s birth month (cut-off Jan 1) influences success.
Question: Are birthdays of NHL players uniformly distributed across the four quarters of the year?
Sample: $n = 80$ random NHL players. - Quarter 1 (Jan-Mar): 32 players. - Quarter 2 (Apr-Jun): 20 players. - Quarter 3 (Jul-Sep): 16 players. - Quarter 4 (Oct-Dec): 12 players.
Conditions: - Random: Stated random sample of 80 players. - 10%: $80 < 10\%$ of all NHL players. - Large Counts: Each expected count is , which is $\ge 5$.
Results: - $\chi^2$ value: - $df = 4 - 1 = 3$. - P-value: .
Conclusion: Reject $H_0$. There is convincing evidence that NHL player birthdays are not uniformly distributed.
Example 5: High School Lunch Sign-Outs
Scenario: A random sample of $n=100$ entries from a school lunch sign-out list.
Hypotheses: - $H_0$: The number of students leaving campus for lunch is uniformly distributed across the 5 days of the week. - $H_a$: The distribution is not uniform.
Calculations: - Expected counts: . All expected counts are $\ge 5$. - $df = 5 - 1 = 4$. - $\chi^2$ value: . - P-value: .
Conclusion: Since the P-value ($0.308$) is greater than $\alpha = 0.05$, fail to reject $H_0$. No convincing evidence that the distribution is not uniform.
Example 6: Genetic Makeup of Tobacco Plants
Scenario: Crossing pairs of Gg tobacco plants (Dominant G for green, recessive g for color). Expected Punnett square ratio: 1:2:1 (25% green, 50% yellow-green, 25% albino).
Observed Data: $n = 84$ offspring. Green (23), Yellow-Green (50), Albino (11).
Test results at $\alpha = 0.05$: - Expected counts: Green (21), Yellow-Green (42), Albino (21). - $\chi^2$ value: . - $df = 3 - 1 = 2$. - P-value: .
Conclusion: Reject $H_0$. The genetic distribution differs from the predicted 1:2:1 ratio.
Follow-Up Analysis
Purpose: Conducted identifying when a $\chi^2$ test result is statistically significant to determine which specific categories cause the deviation from $H_0$.
Procedure: - Examine which categories show the largest deviations between observed and expected counts. - Analyze the individual components of the $\chi^2$ statistic: . - Provide specific numbers and directions (more than expected or less than expected).
NHL Example Follow-up: - The categories contributing the most to $\chi^2 = 11.2$ were Jan-Mar and Oct-Dec. - Jan-Mar: 12 more players were born than expected. - Oct-Dec: 8 fewer players were born than expected.
Tobacco Plant Example Follow-up: - The largest contribution came from the Albino category ($4.762$). - Observed count for Albinos (11) was 10 less than the expected count (21).