Bio 9/25

Chi-square formula measures the difference between observed and expected values.
Used for both univariate and bivariate tests.
Bivariate test assesses independence between two variables.

P-values indicate probability data match expected values (based on null hypothesis).
5% threshold (α = 0.05) for rejecting null hypothesis.
Critical value (C) found in table based on degrees of freedom (df).

Tests if an association exists between two categorical variables, or if they are independent of each other.
- Independence means that the occurrence of one variable does not affect the probability of the other variable occurring.
Null Hypothesis ( $H_0$ ): There is no statistically significant association between the two variables; they are independent.
- Example: There is no difference in the preference for plant types among bumblebees.
Alternative Hypothesis ( $H_1$ ): There is a statistically significant association between the two variables; they are not independent.
- Example: There are differences in the preference for plant types among bumblebees.
Expected distribution is calculated based on the assumption that the null hypothesis is true, often derived from overall proportions or probabilities.

Formula: $\text{Expected} = \frac{(\text{Row Total} \times \text{Column Total})}{\text{Grand Total}}$
Expected values represent the frequencies that would be observed in each cell of a contingency table if the null hypothesis of independence were true.
- Contingency tables are used to display the frequency distribution of the variables.
Importance of expected values for constructing contingency tables and accurately assessing deviations from independence.
Validity requires the average expected value to be $\geq 4$ , and no more than 20% of the cells should have an expected count of less than 5. This ensures the chi-square test statistic approximates the chi-square distribution.

The Chi-square test statistic ( $X^2$ ) quantifies the discrepancy between observed frequencies and expected frequencies.
Chi-square formula: $X^2 = \sum \frac{(\text{obs} - \text{exp})^2}{\text{exp}}$
- "obs" refers to the observed frequency in each cell of the contingency table.
- "exp" refers to the expected frequency in each cell assuming independence.
- The sum ( $\sum$ ) is taken over all cells in the contingency table.
Degrees of freedom ( $df$ ) represent the number of independent pieces of information available to estimate parameters.
Degrees of freedom calculated as: $df = (\text{number of rows} - 1) \times (\text{number of columns} - 1)$ .
- This calculation reflects how many cell values can vary freely once the row and column totals are held constant.

To make a statistical decision, the calculated Chi-square value ( $X^2$ ) is compared against a critical value ( $C$ ).
The critical value ( $C$ ) is obtained from the Chi-square distribution table, using the chosen significance level (e.g., $\alpha = 0.05$ ) and the calculated degrees of freedom ( $df$ ).
Reject null hypothesis ( $H_0$ ) if X^2 > C
- This indicates that the observed differences between frequencies are statistically significant, providing evidence that the two variables are not independent.
Fail to reject null hypothesis ( $H_0$ ) if $X^2 \le C$
- This indicates that there is insufficient statistical evidence to conclude that the variables are dependent; any observed differences are likely due to random sampling variability.
Conclusion indicates existence of statistically significant differences or associations in data patterns without specifying the exact nature or location of these differences.
Follow-up analyses are recommended for detailed insights into where the differences lie or which specific categories contribute most to the overall association (e.g., by examining standardized residuals or performing post-hoc tests).