Study Notes on Hypothesis Testing and Chi-Squared Distribution
Hypothesis Testing and Probability Distribution
Hypothesized Proportion vs. Probability Distribution
- Definition: Hypothesized proportion is derived from a probability distribution but is not the same as the overall probability distribution itself.
- Example: Sampling 100 students where 30% are expected to be staff members.
- Expected count of staff = .
- Observed count may differ, e.g., observed count of staff might be 20.
- Therefore, two pieces of information to analyze: expected (from the distribution) and observed (from the sample).
Differences and Sampling Variability
- Need to investigate if differences between the observed and expected counts are due to sampling variability or if they suggest a change in the hypothesized population.
Null Hypothesis and Evidence
- Null Hypothesis: This serves as the benchmark against the observed data. It typically asserts that the observed information fits the probability distribution.
- Conclusion from hypothesis testing: If a significant difference exists, we gather evidence against the null hypothesis.
Chi-Squared Distribution
Testing Categorical Data
- Chi-Squared Test is applicable for categorical data to assess if observed frequencies differ from expected frequencies.
- Involves calculating a test statistic and comparing it to a critical value instead of using p-values.
Test Statistic Formula
- Previous test statistic formulas (like times sample deviation) are changed when dealing with categorical data. The new formula is:
- Where:
- = observed count for category i,
- = expected count for category i.
Calculating Expected Counts
- The expected count for each category based on hypothesized distribution is obtained from:
- Example: If expected proportion is 0.3 for a category and total sample size is 100, then expected count = 30.
Degrees of Freedom in Chi-Squared Tests
- Degrees of freedom (df) calculations differ between categories and sample sizes.
- General formula for df using categorical data is:
- , where n = number of categories.
Critical Values and Hypothesis Conclusion
Determining Critical Value:
- This is done using a chi-squared distribution table, which requires degrees of freedom and alpha level (typically ).
- Example: If df is 2 and alpha is 0.05, locate as the critical value.
Rejection Rule for Hypotheses
- Compare test statistic with critical value:
- If test statistic > critical value: reject null hypothesis
- If test statistic < critical value: do not reject null hypothesis
Example of Applying Test Statistic and Critical Value
- If test statistic calculated is 21.0303 and critical value is 5.991, then:
- Since 21.0303 > 5.991, we reject null hypothesis.
- Conclusions drawn from rejected null indicate evidence against the null hypothesis, suggesting the company’s claim (or original distribution) may be inaccurate.
Confidence Level and Hypothesis Testing
- Importance of Not Concluding "True" or "False" for Null Hypothesis
- We can never confirm the null hypothesis as true; we either reject or fail to reject.
- The rejection implies evidence against the null, while failing to reject means not enough evidence is present against it.
Applications in Example Scenarios
Example with Baseball Cards:
- Company claims: 30% rookies, 60% veterans, 10% all stars.
- Sample of 110 cards results in 55 rookies, 48 veterans, and 7 all stars.
- Hypotheses for this scenario:
- Null Hypothesis (H0): Proportion of cards as stated:
- Proportion of rookies = 0.3, veterans = 0.6, all stars = 0.1
- Alternative Hypothesis (H1): At least one of the stated proportions is inaccurate.
Degrees of Freedom in Example: There are 3 categories (rookies, veterans, all-stars), so degrees of freedom = 3 - 1 = 2.
Expected Counts: Based on the sample of 110 cards:
- Rookies:
- Veterans:
- All Stars:
Test Statistic Calculation:
- Final values plugged to give the chi-squared statistic.
Calculation via Statistical Software or Calculators
- Chi-Squared in Calculators:
- Steps to enter data and calculate the p-value and expected counts in statistical calculators using matrices.
- Use second button followed by x to negative first to access matrix functions.
- Calculate (observed count, expected count) to get both the chi-squared value and expected counts simultaneously through tests for independence or homogeneity.
Distinction Between Test Types
Chi-Squared Test for Independence vs. Homogeneity
- Independence assesses whether two categorical variables influence each other.
- Homogeneity assesses whether proportions of different populations are the same regarding their characteristics.
- Hypotheses structure remains similar, but focus shifts according to the context of the population under investigation.
Caution in Conclusion
- Ensure hypotheses accurately reflect the nature of the assessment (independence vs. homogeneity) to avoid erroneous in context conclusions.
Summary and Next Steps
- Review on Chapters 1-10 to prepare for exams and consider upcoming evaluations to solidify understanding of these concepts.
- Engagement during reviews for chapters will be emphasized to clarify difficult topics.
- Final examinations and preparation reviews for sections are important for consolidating knowledge and applying these analytical skills to empirical data.