Chi-Square Test Notes

Focus on chi-square tests and their applications within quantitative research.
Topics covered in this workshop:
- Goodness-of-fit test (one-way)
- Test for independence (two-way)
- Workshop tasks involving chi-square tests.

Types of Designs:
- Correlational DesignChi-Square Test Notess: Test relationships between two variables.
- Parametric Tests:
  - Pearson correlation (for scale data)
- Non-parametric Tests:
  - Spearman or Kendall-Tau correlation (for ordinal data)
- Experimental Designs: Test for differences.
- Within Subjects IV:
  - Paired t-test (parametric)
  - Wilcoxon test (non-parametric)
- Between Subjects IV:
  - Independent t-test (parametric)
  - Mann-Whitney test (non-parametric)

Scale (Interval/Ratio): Numbers display amount of difference between observations.
- Example: A score of 57 is as much higher than 50 as 45 is from 38.
Ordinal: Numbers represent more or less of a measure.
- Example: Stating 7 is happier than 4, which is happier than 2.
Nominal (Categorical): Numbers serve as labels for categories without numerical value.
- Example: Gender represented as labels (1 = male, 2 = female).
- Example: Eye color (1 = brown, 2 = blue, 3 = green).

Used to examine relationships between nominal (categorical) variables.
Purpose: To determine if observed frequencies significantly differ from expected frequencies.
Commonly applied in:
- Categorical survey responses
- Experimental conditions
- Behavioral studies involving count data.
Hypothesis Tested: Are the observed arrangements random or indicative of an effect?

Goodness-of-Fit Test (One-way):
- Compares observed data against an expected distribution.
- Example: Analyze racial proportions of students at Cambridge University vs. general population proportions.
Test for Independence (Two-way):
- Examines relation between two categorical variables.
- Example: Analyze if proportions of smokers differ between genders (male/female).
- Can handle multiple levels (e.g., analyzing age categories against marital status).
- Can be treated as a test of difference between IV (independent variable) and DV (dependent variable).

Data must be categorical (nominal or ordinal).
Data should be independent: Each participant contributes to only one cell in the contingency table.
Expected Frequency Requirements:
- At least 5 expected frequencies in 80% of cells.
- In larger tables, up to 20% may be under 5 but should never be below 1 for any cell.
- If below 1, use Fisher's exact test (only applicable to 2x2 designs) or collapse some cells together.

Limited to two categorical variables (one usually as the response category).
More than two categorical variables require log-linear analysis.
Cannot analyze parametric data.
Do not measure the strength or direction of relationships.
Sensitive to sample size; larger samples may yield significant results with minimal effect size differences.
Generally low statistical power, making it hard to detect true effects.

Usage: Compares observed data distribution to an expected theoretical distribution of one variable.
Examines the number of cases in each level of a variable against expected frequencies under the null hypothesis.

Hypotheses:
- Null (H0): Number of cases in each category is equal (random arrangement).
- Alternative (H1): Numbers differ significantly (not randomly arranged).
Data Collection: Gather observed values and determine expected frequencies from the theoretical prediction or assume uniform distribution.
Statistical Calculation: Calculate probability of obtaining a chi-square statistic, using the formula:
$\chi^2 = \sum \frac{(O - E)^2}{E}$
where O = observed frequency, E = expected frequency.

Scenario: Polling for upcoming presidential election whether voters support Dale or Beck.
Poll Results: 58 voted for Beck, 42 for Dale out of 100.
Questions raised: Is this difference significant enough to predict Beck's victory?
Predictions: Null hypothesis assumes equal preference (50:50) in the population.
Calculate chi-square statistic for observed votes vs. expectations (50 for each).
Contingency Table:
- Votes for Beck: Observed 58 (Expected 50)
- Votes for Dale: Observed 42 (Expected 50)

Table 1: Contingency table showing expected and observed voting preferences.

Statistical software JASP is utilized to run multinomial tests by selecting Frequencies > Multinomial test.
Conclusion Interpretation:
- Compare chi-square output against the p-value (e.g., p < 0.05 considered significant).

Scenario Expanded: An election included 5 candidates and data recorded:
- Votes: Dale 35, Beck 47, Palmer 5, Bartlet 2, No vote 11.
Null hypothesis: No voter preference.
Degrees of freedom: k = 5 gives df = 4. Expected frequencies based on uniform distribution.

Usage: Explore relationships between two categorical variables or the effect of a categorical independent variable (IV) on a categorical dependent variable (DV).
Each variable can possess two or more categories.
Objective: Compare observed frequencies against expected values when there’s no association between the two variables.

Hypotheses:
- Null (H0): No association between variables.
- Alternative (H1): Variables are related.
Data Collection: Record frequencies in a contingency table (observed and expected).
Calculation: Compute chi-square statistic using:
$\chi^2 = \sum \frac{(O - E)^2}{E}$
and determine probability by chance for extreme values.

Scenario: Police line-ups where victims may incorrectly identify suspects based on clothing color.
Variables: Outcome (correct ID/wrong ID) and clothing color (same/different).

Table of Results:

Observed Frequencies: 24 correct ID and 61 wrong ID for same clothes; 23 correct ID and 37 wrong ID for different clothes.

Final results should be formatted according to APA style in contingency tables, detailing expected versus observed counts, chi-square values, and p-values.
Example non-significant report: "There was no significant effect of line-up type…"
Example significant report: "There was a significant effect…"

Proficiently conduct chi-square tests and interpret statistical significance findings.
Report results in a format consistent with academic standards (APA format).