Study Notes on Chi-Square Test for Independence

Overview: This section examines whether there is a relationship between two categorical variables.

Definition: Categorical variables represent data that fall into categories, which can be either nominal (e.g., political affiliation) or ordinal (e.g., academic ranking).
Examples:
- Political affiliation
- Academic major
- Verdict choice (e.g., guilty vs. not guilty)
Data Representation: Frequencies are recorded (e.g., number of Psychology majors, number of people selecting a guilty verdict), not means.

Matrix Structure: Represents data with two categorical variables.
Example Variables:
- Variable 1: "Do you have superpowers?" (Responses: yes/no)
- Variable 2: "What is your favorite ice cream flavor?" (Options: chocolate or vanilla)
Observed Frequencies: Numbers inside the table representing the actual counts (e.g., number of people with superpowers who like chocolate).
Total Counts: Sums of rows and columns are shown outside the table.

Independence Concept: Variables are considered independent if the frequency distribution of one variable does not affect the frequency distribution of the other.
Independence Example: If 60% of the dataset prefers chocolate, if powers are independent of flavor preference, then:
- 60% of people with powers prefer chocolate
- 60% of people without powers prefer chocolate

State the Hypotheses:
- Null Hypothesis (H0): Powers and flavor preference are independent.
- Alternative Hypothesis (H1): Powers and flavor preference are NOT independent.
Calculate Degrees of Freedom (df) & Critical Region:
- Formula for df: $df = (R-1)(C-1)$
- Where R is the number of rows, C is the number of columns.
- For the example: R = 2 (yes/no), C = 2 (chocolate/vanilla)
- Calculation: $df = (2-1)(2-1) = 1\times1 = 1$
Calculate Statistics:
- Find expected frequencies (Fe).

Definition: Expected frequencies are the anticipated counts under the null hypothesis assuming independence of variables.
Calculation Methods:
- Version 1:
  i. Calculate percentages for columns (e.g., percentage of the dataset that likes chocolate).
- Example Calculation of Percentages:
- Total liking chocolate = 60 out of 100 = 0.60 or 60%.
- Total liking vanilla = 40 out of 100 = 0.40 or 40%.
- Remarks: Keep four decimal points in percentages without rounding.

For each row: Multiply column percentage by row total.
Example calculations:
- Expected frequency for "Has Powers" liking chocolate:
- $0.60 \times 30 = 18$ (from 30 people who have powers)
- Expected frequency for "No Powers" liking chocolate:
- $0.60 \times 70 = 42$ (from 70 people who do not have powers)
Expected Frequencies for Vanilla:
- Powers: $0.40 \times 30 = 12$
- No Powers: $0.40 \times 70 = 28$
Validation: Ensure that sums of expected frequencies in rows equals total rows and columns.

Formula: $\chi^2 = \sum \frac{(O - E)^2}{E}$
- Where (O) is the observed frequency and (E) is the expected frequency.
Example Calculation:
- $\chi^2 = \sum (\frac{(23-18)^2}{18} + \frac{(7-12)^2}{12} + \frac{(37-42)^2}{42} + \frac{(33-28)^2}{28})$
- Completion of calculations leads to results being summed up to determine the chi-square value. Chi-square is always non-negative.

Compare Chi-square Statistic: Calculate obtained chi-square against critical value from chi-square distribution table using df.
Example Result: For provided data, chi-square calculated = 4.96 against critical = 3.84.
Conclusion: Determine whether to reject or not reject H0 based on whether the statistic exceeds the critical value.

Report findings in APA style: e.g., "We conducted a chi-square test for independence to evaluate whether superpowers (present, absent) was related to ice cream flavor preference (chocolate, vanilla)."
Describe the significance of the results and the interpretation of the percentage differences between categories across different groups. Include exact chi-square, df, and p-values in reporting.

Phi (φ) Statistic: Utilized when both categorical variables are dichotomous; computes the strength of association.
- Formula: $Φ = \sqrt{\frac{\chi^2}{n}}$
Cramer’s V: Used when the matrix is larger than 2x2, providing an alternative measure of effect size:
- Formula: $V = \sqrt{\frac{\chi^2}{n \cdot df^<em>}}$ where df is the smaller of either (R-1) or (C-1).

Utilize SPSS for statistical analysis, which provides outputs like observed frequencies, chi-square tests, and effect sizes. Ensure to interpret and report based on the context of the study and statistical outputs.