Study Notes on Hypothesis Testing and Chi-Squared Distribution

Hypothesis Testing and Probability Distribution

  • Hypothesized Proportion vs. Probability Distribution

    • Definition: Hypothesized proportion is derived from a probability distribution but is not the same as the overall probability distribution itself.
    • Example: Sampling 100 students where 30% are expected to be staff members.
    • Expected count of staff = 0.3imes100=300.3 imes 100 = 30.
    • Observed count may differ, e.g., observed count of staff might be 20.
    • Therefore, two pieces of information to analyze: expected (from the distribution) and observed (from the sample).
  • Differences and Sampling Variability

    • Need to investigate if differences between the observed and expected counts are due to sampling variability or if they suggest a change in the hypothesized population.
  • Null Hypothesis and Evidence

    • Null Hypothesis: This serves as the benchmark against the observed data. It typically asserts that the observed information fits the probability distribution.
    • Conclusion from hypothesis testing: If a significant difference exists, we gather evidence against the null hypothesis.

Chi-Squared Distribution

  • Testing Categorical Data

    • Chi-Squared Test is applicable for categorical data to assess if observed frequencies differ from expected frequencies.
    • Involves calculating a test statistic and comparing it to a critical value instead of using p-values.
  • Test Statistic Formula

    • Previous test statistic formulas (like n1n - 1 times sample deviation) are changed when dealing with categorical data. The new formula is:

    extTestStatistic=(O<em>iE</em>i)2Eiext{Test Statistic} = \sum \frac{(O<em>i - E</em>i)^2}{E_i}

    • Where:
    • OiO_i = observed count for category i,
    • EiE_i = expected count for category i.
  • Calculating Expected Counts

    • The expected count for each category based on hypothesized distribution is obtained from:
    • Example: If expected proportion is 0.3 for a category and total sample size is 100, then expected count = 30.
  • Degrees of Freedom in Chi-Squared Tests

    • Degrees of freedom (df) calculations differ between categories and sample sizes.
    • General formula for df using categorical data is:
    • df=(n1)df = (n - 1), where n = number of categories.

Critical Values and Hypothesis Conclusion

  • Determining Critical Value:

    • This is done using a chi-squared distribution table, which requires degrees of freedom and alpha level (typically 0.050.05).
    • Example: If df is 2 and alpha is 0.05, locate 5.9915.991 as the critical value.
  • Rejection Rule for Hypotheses

    • Compare test statistic with critical value:
    1. If test statistic > critical value: reject null hypothesis
    2. If test statistic < critical value: do not reject null hypothesis
  • Example of Applying Test Statistic and Critical Value

    • If test statistic calculated is 21.0303 and critical value is 5.991, then:
    • Since 21.0303 > 5.991, we reject null hypothesis.
    • Conclusions drawn from rejected null indicate evidence against the null hypothesis, suggesting the company’s claim (or original distribution) may be inaccurate.

Confidence Level and Hypothesis Testing

  • Importance of Not Concluding "True" or "False" for Null Hypothesis
    • We can never confirm the null hypothesis as true; we either reject or fail to reject.
    • The rejection implies evidence against the null, while failing to reject means not enough evidence is present against it.

Applications in Example Scenarios

  • Example with Baseball Cards:

    • Company claims: 30% rookies, 60% veterans, 10% all stars.
    • Sample of 110 cards results in 55 rookies, 48 veterans, and 7 all stars.
    • Hypotheses for this scenario:
    • Null Hypothesis (H0): Proportion of cards as stated:
      • Proportion of rookies = 0.3, veterans = 0.6, all stars = 0.1
    • Alternative Hypothesis (H1): At least one of the stated proportions is inaccurate.
  • Degrees of Freedom in Example: There are 3 categories (rookies, veterans, all-stars), so degrees of freedom = 3 - 1 = 2.

  • Expected Counts: Based on the sample of 110 cards:

    • Rookies: E=110imes0.3=33E = 110 imes 0.3 = 33
    • Veterans: E=110imes0.6=66E = 110 imes 0.6 = 66
    • All Stars: E=110imes0.1=11E = 110 imes 0.1 = 11
  • Test Statistic Calculation:

    • extTestStatistic=(5533)233+(4866)266+(711)211ext{Test Statistic} = \frac{(55 - 33)^2}{33} + \frac{(48 - 66)^2}{66} + \frac{(7 - 11)^2}{11}
    • Final values plugged to give the chi-squared statistic.

Calculation via Statistical Software or Calculators

  • Chi-Squared in Calculators:
    • Steps to enter data and calculate the p-value and expected counts in statistical calculators using matrices.
    • Use second button followed by x to negative first to access matrix functions.
    • Calculate (observed count, expected count) to get both the chi-squared value and expected counts simultaneously through tests for independence or homogeneity.

Distinction Between Test Types

  • Chi-Squared Test for Independence vs. Homogeneity

    • Independence assesses whether two categorical variables influence each other.
    • Homogeneity assesses whether proportions of different populations are the same regarding their characteristics.
    • Hypotheses structure remains similar, but focus shifts according to the context of the population under investigation.
  • Caution in Conclusion

    • Ensure hypotheses accurately reflect the nature of the assessment (independence vs. homogeneity) to avoid erroneous in context conclusions.

Summary and Next Steps

  • Review on Chapters 1-10 to prepare for exams and consider upcoming evaluations to solidify understanding of these concepts.
    • Engagement during reviews for chapters will be emphasized to clarify difficult topics.
  • Final examinations and preparation reviews for sections are important for consolidating knowledge and applying these analytical skills to empirical data.