By the end of this chapter, students will be able to:
Conduct a ( \chi^2 ) goodness-of-fit test to compare frequency data to a probability model stated by a null hypothesis, which includes:
Proportional models
Discrete probability models (Poisson)
Understand the implications of a Poisson distribution.
State the assumptions that underlie a ( \chi^2 ) goodness-of-fit test.
The ( \chi^2 ) goodness-of-fit test is a statistical method used to compare observed counts from categorical or discrete frequency data to the counts that would be expected if a specific probability distribution were true. This test helps determine if the observed data significantly deviates from what was expected under the null hypothesis.
For a ( \chi^2 ) test, the following hypotheses apply:
Null Hypothesis (( H_0 )): The data come from a specified probability distribution.
Alternative Hypothesis (( H_A )): The data do not come from that distribution.
The test statistic for the ( \chi^2 ) test is calculated as: [ \chi^2 = \sum \frac{(O_i - E_i)^2}{E_i} ]where:
( O_i ) = Observed frequency
( E_i ) = Expected frequency. This formula sums the squared differences between observed and expected frequencies, weighted by the expected frequencies, across all categories.
A practical application of the ( \chi^2 ) goodness-of-fit test is illustrated by analyzing the birth months of NHL players:
Data: A total of 970 NHL players was surveyed, with the following breakdown by month:
January: 86
February: 99
March: 103
April: 90
May: 102
June: 68
July: 100
August: 64
September: 61
October: 77
November: 57
December: 63
Null Hypothesis (( H_0 )): The probability of an NHL birth occurring in any month is equal to national proportions of birth months.
Alternative Hypothesis (( H_A )): The probability of an NHL birth in any month is not equal to national proportions.
The expected frequencies for each month are calculated by multiplying the total number of NHL players by the national proportion of births for each month. For example:
January: 86 observed, expected = 78.57
February: 99 observed, expected = 74.69
... and so on for each month, with the totals aligning to 970.
Following the computation of expected values, the ( \chi^2 ) statistic is determined for each month:
E.g., January: [ \chi^2_{Jan} = \frac{(86 - 78.57)^2}{78.57} = 0.7026 ]
The final test statistic ( \chi^2 ) is the sum of all calculated ( \chi^2 ) values per month, resulting in a total of 36.5.
The degrees of freedom for a ( \chi^2 ) test can be calculated using the formula: [ df = k - p - 1 ] where:
( k ) = number of categories
( p ) = number of parameters estimated from the data. In the context of the NHL data, the degrees of freedom would equal 12 - 1 = 11.
The critical value for the ( \chi^2 ) statistic can be determined from distribution tables. If the calculated test statistic exceeds this critical value at a specified significance level (e.g., ( \alpha = 0.05 )), we reject the null hypothesis. In this case, results indicated ( P < 0.05 ) leading to the conclusion that NHL players are not born in the same proportions per month as the general population.
Types of Variables: Single categorical or discrete numerical variable.
Null Hypothesis: Data conforms to a specified distribution.
Test Statistic: ( \chi^2 )
Degrees of Freedom: Calculated based on categories and parameters estimated.
Assumptions: Ensure no more than 20% of categories have expected counts less than 5, and none is less than or equal to 1.
The Poisson distribution models the probability of a given number of events occurring within a fixed period or space, assuming independence among events and equal likelihood of occurrences. It is characterized by the mean rate (( \mu )) of events, serving as a central parameter.
Chapter 8 presents essential methods for conducting ( \chi^2 ) goodness-of-fit tests and understanding their practical implications through real-world examples such as NHL birth months and events modeled by the Poisson distribution.