08. Testing probability models Gaynor 2024W2 postlecture

Chapter 8: Testing Probability Models Using Frequency Data

Objectives

By the end of this chapter, students will be able to:

  • Conduct a ( \chi^2 ) goodness-of-fit test to compare frequency data to a probability model stated by a null hypothesis, which includes:

    • Proportional models

    • Discrete probability models (Poisson)

  • Understand the implications of a Poisson distribution.

  • State the assumptions that underlie a ( \chi^2 ) goodness-of-fit test.

Introduction to Chi-Squared Goodness-of-Fit Test

The ( \chi^2 ) goodness-of-fit test is a statistical method used to compare observed counts from categorical or discrete frequency data to the counts that would be expected if a specific probability distribution were true. This test helps determine if the observed data significantly deviates from what was expected under the null hypothesis.

Hypotheses

For a ( \chi^2 ) test, the following hypotheses apply:

  • Null Hypothesis (( H_0 )): The data come from a specified probability distribution.

  • Alternative Hypothesis (( H_A )): The data do not come from that distribution.

Test Statistic

The test statistic for the ( \chi^2 ) test is calculated as: [ \chi^2 = \sum \frac{(O_i - E_i)^2}{E_i} ]where:

  • ( O_i ) = Observed frequency

  • ( E_i ) = Expected frequency. This formula sums the squared differences between observed and expected frequencies, weighted by the expected frequencies, across all categories.

Example: NHL Players' Birth Months

A practical application of the ( \chi^2 ) goodness-of-fit test is illustrated by analyzing the birth months of NHL players:

  • Data: A total of 970 NHL players was surveyed, with the following breakdown by month:

    • January: 86

    • February: 99

    • March: 103

    • April: 90

    • May: 102

    • June: 68

    • July: 100

    • August: 64

    • September: 61

    • October: 77

    • November: 57

    • December: 63

Hypothesis for Birth Month Example

  • Null Hypothesis (( H_0 )): The probability of an NHL birth occurring in any month is equal to national proportions of birth months.

  • Alternative Hypothesis (( H_A )): The probability of an NHL birth in any month is not equal to national proportions.

Expected Values Calculation

The expected frequencies for each month are calculated by multiplying the total number of NHL players by the national proportion of births for each month. For example:

  • January: 86 observed, expected = 78.57

  • February: 99 observed, expected = 74.69

  • ... and so on for each month, with the totals aligning to 970.

Test Statistic Calculation

Following the computation of expected values, the ( \chi^2 ) statistic is determined for each month:

  • E.g., January: [ \chi^2_{Jan} = \frac{(86 - 78.57)^2}{78.57} = 0.7026 ]

  • The final test statistic ( \chi^2 ) is the sum of all calculated ( \chi^2 ) values per month, resulting in a total of 36.5.

Degrees of Freedom

The degrees of freedom for a ( \chi^2 ) test can be calculated using the formula: [ df = k - p - 1 ] where:

  • ( k ) = number of categories

  • ( p ) = number of parameters estimated from the data. In the context of the NHL data, the degrees of freedom would equal 12 - 1 = 11.

Critical Value and Conclusion

The critical value for the ( \chi^2 ) statistic can be determined from distribution tables. If the calculated test statistic exceeds this critical value at a specified significance level (e.g., ( \alpha = 0.05 )), we reject the null hypothesis. In this case, results indicated ( P < 0.05 ) leading to the conclusion that NHL players are not born in the same proportions per month as the general population.

Review of Key Concepts

  1. Types of Variables: Single categorical or discrete numerical variable.

  2. Null Hypothesis: Data conforms to a specified distribution.

  3. Test Statistic: ( \chi^2 )

  4. Degrees of Freedom: Calculated based on categories and parameters estimated.

  5. Assumptions: Ensure no more than 20% of categories have expected counts less than 5, and none is less than or equal to 1.

The Poisson Distribution

The Poisson distribution models the probability of a given number of events occurring within a fixed period or space, assuming independence among events and equal likelihood of occurrences. It is characterized by the mean rate (( \mu )) of events, serving as a central parameter.

Conclusion

Chapter 8 presents essential methods for conducting ( \chi^2 ) goodness-of-fit tests and understanding their practical implications through real-world examples such as NHL birth months and events modeled by the Poisson distribution.

robot