Study Notes on Hypothesis Testing and Chi-Squared Distribution

Hypothesized Proportion vs. Probability Distribution
- Definition: Hypothesized proportion is derived from a probability distribution but is not the same as the overall probability distribution itself.
- Example: Sampling 100 students where 30% are expected to be staff members.
- Expected count of staff = $0.3 imes 100 = 30$ .
- Observed count may differ, e.g., observed count of staff might be 20.
- Therefore, two pieces of information to analyze: expected (from the distribution) and observed (from the sample).
Differences and Sampling Variability
- Need to investigate if differences between the observed and expected counts are due to sampling variability or if they suggest a change in the hypothesized population.
Null Hypothesis and Evidence
- Null Hypothesis: This serves as the benchmark against the observed data. It typically asserts that the observed information fits the probability distribution.
- Conclusion from hypothesis testing: If a significant difference exists, we gather evidence against the null hypothesis.

Testing Categorical Data
- Chi-Squared Test is applicable for categorical data to assess if observed frequencies differ from expected frequencies.
- Involves calculating a test statistic and comparing it to a critical value instead of using p-values.
Test Statistic Formula
- Previous test statistic formulas (like $n - 1$ times sample deviation) are changed when dealing with categorical data. The new formula is:
$ext{Test Statistic} = \sum \frac{(O<em>i - E</em>i)^2}{E_i}$
- Where:
- $O_i$ = observed count for category i,
- $E_i$ = expected count for category i.
Calculating Expected Counts
- The expected count for each category based on hypothesized distribution is obtained from:
- Example: If expected proportion is 0.3 for a category and total sample size is 100, then expected count = 30.
Degrees of Freedom in Chi-Squared Tests
- Degrees of freedom (df) calculations differ between categories and sample sizes.
- General formula for df using categorical data is:
- $df = (n - 1)$ , where n = number of categories.

Determining Critical Value:
- This is done using a chi-squared distribution table, which requires degrees of freedom and alpha level (typically $0.05$ ).
- Example: If df is 2 and alpha is 0.05, locate $5.991$ as the critical value.
Rejection Rule for Hypotheses
- Compare test statistic with critical value:
1. If test statistic > critical value: reject null hypothesis
2. If test statistic < critical value: do not reject null hypothesis
Example of Applying Test Statistic and Critical Value
- If test statistic calculated is 21.0303 and critical value is 5.991, then:
- Since 21.0303 > 5.991, we reject null hypothesis.
- Conclusions drawn from rejected null indicate evidence against the null hypothesis, suggesting the company’s claim (or original distribution) may be inaccurate.

Importance of Not Concluding "True" or "False" for Null Hypothesis
- We can never confirm the null hypothesis as true; we either reject or fail to reject.
- The rejection implies evidence against the null, while failing to reject means not enough evidence is present against it.

Example with Baseball Cards:
- Company claims: 30% rookies, 60% veterans, 10% all stars.
- Sample of 110 cards results in 55 rookies, 48 veterans, and 7 all stars.
- Hypotheses for this scenario:
- Null Hypothesis (H0): Proportion of cards as stated:
  - Proportion of rookies = 0.3, veterans = 0.6, all stars = 0.1
- Alternative Hypothesis (H1): At least one of the stated proportions is inaccurate.
Degrees of Freedom in Example: There are 3 categories (rookies, veterans, all-stars), so degrees of freedom = 3 - 1 = 2.
Expected Counts: Based on the sample of 110 cards:
- Rookies: $E = 110 imes 0.3 = 33$
- Veterans: $E = 110 imes 0.6 = 66$
- All Stars: $E = 110 imes 0.1 = 11$
Test Statistic Calculation:
- $ext{Test Statistic} = \frac{(55 - 33)^2}{33} + \frac{(48 - 66)^2}{66} + \frac{(7 - 11)^2}{11}$
- Final values plugged to give the chi-squared statistic.

Chi-Squared in Calculators:
- Steps to enter data and calculate the p-value and expected counts in statistical calculators using matrices.
- Use second button followed by x to negative first to access matrix functions.
- Calculate (observed count, expected count) to get both the chi-squared value and expected counts simultaneously through tests for independence or homogeneity.

Chi-Squared Test for Independence vs. Homogeneity
- Independence assesses whether two categorical variables influence each other.
- Homogeneity assesses whether proportions of different populations are the same regarding their characteristics.
- Hypotheses structure remains similar, but focus shifts according to the context of the population under investigation.
Caution in Conclusion
- Ensure hypotheses accurately reflect the nature of the assessment (independence vs. homogeneity) to avoid erroneous in context conclusions.

Review on Chapters 1-10 to prepare for exams and consider upcoming evaluations to solidify understanding of these concepts.
- Engagement during reviews for chapters will be emphasized to clarify difficult topics.
Final examinations and preparation reviews for sections are important for consolidating knowledge and applying these analytical skills to empirical data.