Chapter 15: The Chi-Square Statistic: Tests for Goodness of Fit and Independence

Chapter Overview

This chapter covers the chi-square statistic and its applications in analyzing data through tests for goodness of fit and tests of independence.

Learning Outcomes

Explain when a chi-square test is appropriate.
Test hypotheses about the shape of a distribution using the chi-square goodness of fit.
Test hypotheses about the relationship between variables using the chi-square test of independence.
Evaluate the effect size using the phi coefficient or Cramér’s V.

Tools Required

Proportions (for math review, reference Appendix A).
Frequency distributions (refer to Chapter 2).

15-1: Introduction to Chi-Square: The Test for Goodness of Fit

Statistical tests previously discussed aim to test hypotheses regarding population parameters; these tests are categorized as parametric tests.

Characteristics of Parametric Tests

Share several fundamental assumptions:
- Normal distribution in the population.
- Homogeneity of variance across the population.
- Requirement of a numerical score for each individual.
- Data must stem from an interval or ratio scale.

Nonparametric Tests

If research data does not meet the requirements for parametric tests, nonparametric tests serve as an alternative:
- Do not specify hypotheses based on a specific population parameter.
- Fewer assumptions about population distribution (termed as "distribution-free" tests).
- Chi-square tests are two types of nonparametric tests.

Classification and Measurement Scale

Nonparametric tests typically classify participants into categories using nominal or ordinal scales.
Data for nonparametric tests may include frequency counts (e.g., number of individuals from different political affiliations).
Nonparametric methods are useful when interval or ratio measures cannot be achieved.
Situations arise where it is impossible to obtain precise scores; categorical classification is then employed.

Selection of Statistical Procedure

Choice of statistical analyses rests primarily on the measurement level:
- Chi-square tests, t-tests, or ANOVA may be selected for hypothesis testing concerning relationships.
- Evaluation of relationship strength may also involve the use of $r^2$.

The Chi-Square Test for Goodness of Fit

The chi-square test evaluates research questions regarding proportions or relative frequencies within a distribution.
It utilizes sample data to test hypotheses regarding the shape or proportions of the population’s distribution.
The test assesses how well the sample data proportions align with the expected proportions postulated for the population.

Null Hypothesis for the Goodness-of-Fit Test

The null hypothesis (H₀) defines the proportion or percentage distribution of the population across each category.
Main rationale for the null hypothesis includes:
- No preference among categories (implying equal proportions).
- No observed difference in the specific population relative to another known population’s proportions.
The alternative hypothesis (H₁) posits that the population distribution deviates from the specified proportions in H₀, often concisely presented as "Not H₀".

Data Requirements for Goodness-of-Fit Test

Sample mean or SS calculation is not necessary:
- Individuals classified based on categories (e.g., grades, frequency of exercise).
- Each measurement category has documented observed frequencies.
- Observations must be mutually exclusive (an individual belongs to one category).

Expected Frequencies

The goodness-of-fit test contrasts observed frequencies against expected frequencies based on the null hypothesis.
Expected frequencies are computed to align with the null hypothesis and are dependent on sample size (n).
Represent an idealized prediction of sample distribution.

The Chi-Square Statistic

Denoted by $\chi^2$, it represents the chi-square statistic:
- $f_o$ is the set of observed frequencies.
- $f_e$ is the set of expected frequencies.

Chi-Square Value Implications

The chi-square value signifies the discrepancy magnitude between observed data and expected frequencies:
- A smaller chi-square value suggests a closer alignment to the null hypothesis, indicating a good fit.

Chi-Square Distribution and Degrees of Freedom

Null Hypothesis Discrimination Criteria

The null hypothesis should be upheld when there is a small discrepancy between observed and expected values.
Rejection occurs with substantial discrepancies.

Characteristics of Chi-Square Distribution

The chi-square distribution encompasses values for all possible random samples when H₀ is accurate.
All chi-square values are $
geq 0$, indicating that when H₀ holds, chi-square values should remain low.
The distribution exhibits positive skewness and is a family of distributions influenced by degrees of freedom (df).

Degrees of Freedom for Goodness-of-Fit

Formula for degrees of freedom:
- $df = C - 1$
- Where C represents the number of categories.
- df is unaffected by sample size (n).

Locating the Critical Region for a Chi-Square Test

Steps to find the critical region include:
- Setting significance level (alpha).
- Using a chi-square distribution table listed in Appendix B, identifying critical chi-square values based on:
- Degrees of freedom (df) in the first column.
- Significance level (alpha) in the top row.

Critical Value Representation

The critical values of chi-square are found within the table body, facilitating decision making regarding the null hypothesis.

Reporting Results of Chi-Square

The results should detail significant differences among categories and incorporate summary elements:
- Presenting results as $\chi^2$ with degrees of freedom, sample size (n), and the test statistic.
- Example: $\chi^2(3, n = 50) = 9.36, p < .05$.
- Include observed frequencies for each category when applicable.

Goodness of Fit versus Single-Sample t Test

Comparison of nonparametric chi-square tests against parametric t tests:
- The chi-square test for goodness of fit operates independently of population distribution assumptions.
- Conversely, the t test presumes a normal population, necessitating numerical scores, and evaluates hypotheses concerning population means.

Test Similarities

Both tests use a single sample to infer conditions regarding the single population.
The measurement level dictates the choice of testing method:
- Numerical scores (interval/ratio) imply usage of t test.
- Non-numerical classifications (ordinal or nominal) necessitate use of chi-square tests based on proportions or percentages.

Learning Check 1

Question: Expected frequencies in a chi-square test…
- are always whole numbers.
- can contain fractions or decimal values.
- can contain both positive and negative values.
- can contain fractions and negative numbers.

Learning Check 1 – Answer

Correct Option: can contain fractions or decimal values.

Learning Check 2

Determine True or False:
- In a chi-square test, the observed frequencies are always whole numbers. (T/F)
- A large value for chi-square will tend to retain the null hypothesis. (T/F)

Learning Check 2 – Answers

True: Observed frequencies are counts, hence no fractions.
False: Large chi-square values suggest considerable disparity from null hypothesis predictions.

15-3: The Chi-Square Test for Independence

Chi-square can also be employed to ascertain auxiliary relationships between two variables:
- Each participant is measured against both variables, categorized into a matrix.
- This design may be derived from either experimental or non-experimental frameworks.
Frequency data from the sample tests the evidence for the relationship between two variables across the population using a two-dimensional frequency distribution matrix.

Null Hypothesis for Test of Independence

States that the two variables are independent, indicating that no inherent relationship exists.
Two interpretations of this hypothesis:
- Single sample where each individual measures two variables, indicating no relationship.
- Multiple separate samples imply no differing distribution across variable proportions amongst the populations.

Observed and Expected Frequencies

In the sample population distribution, observed frequencies will be counted.
Expected frequencies derive from the null hypothesis and are computed by the distribution proportions governed by both variables’ rows and columns.

Computing Expected Frequencies

Utilizes a straightforward method for any cell in the frequency distribution matrix:
- $f<em>e = \frac{n \cdot f</em>{c}}{f_{r}}$
- Where:
- $f_c$ = column frequency total.
- $f_r$ = row frequency total.
- $n$ = total number of individuals in the sample.

Chi-Square Statistic and Degrees of Freedom for Independence

The chi-square statistic calculation remains consistent with the goodness-of-fit test:
Degrees of freedom for the test of independence determined by:
- $df = (R - 1)(C - 1)$
- Where R is the count of rows and C the number of columns.

Summary Steps for Chi-Square Test for Independence

Standard procedure encompasses four steps:
1. State hypotheses and establish alpha level.
2. Identify the critical region.
3. Compute the test statistic.
4. Reach a conclusion based on results.

15-4: Effect Size and Assumptions for Chi-Square Tests

A significant chi-square test outcome indicates a non-chance occurrence.
Significance tests hinge on both treatment effect magnitude and sample size.
Each hypothesis test result should be accompanied by an appropriate effect size measure.

Cohen’s w

Applicable for both chi-square tests regarding effect size:
- The Pₒ values signify the observed proportions.
As per Cohen, effect size standards are as follows:
- 0.10 indicates a small effect.
- 0.30 indicates a medium effect.
- 0.50 indicates a large effect.
Notably, Cohen’s w is independent of sample size; only sample proportions relative to the null hypothesis signal the computation of w.
w and chi-square possess algebraic relations.

Phi-Coefficient and Cramér’s V

The phi coefficient (Φ) assesses strength for 2 × 2 matrices:
- $\Phi^2$ accounts for the proportion of variance.
For larger matrices, Cramér’s V, an adjustment to the phi coefficient, measures effect size:
- $df^*$ is derived from the lower of (R - 1) or (C - 1).

Standards for Interpreting Cramér’s V

Interpretive Standards for Cramér’s V:
- Small Effect | Medium Effect | Large Effect
- For df* = 1: 0.10 | 0.30 | 0.50
- For df* = 2: 0.07 | 0.21 | 0.35
- For df* = 3: 0.06 | 0.17 | 0.29

Assumptions and Restrictions for Chi-Square Tests

Specific conditions must be adhered to in order to apply a chi-square test for goodness of fit or independence:
- Independence of Observations: Observed frequencies must derive from different individuals.
- Expected Frequencies' Size: Performing chi-square tests is inadvisable if any cell's expected frequency is below 5.

Learning Check 3

Question: A basic assumption for chi-square hypothesis test is .
- The population distributions must be normal.
- The scores necessitate an interval or ratio scale.
- The observations must remain independent.
- None of the above options are assumptions for chi-square.

Learning Check 3 – Answer

Correct Option: The observations must remain independent.

Learning Check 4

Determine True or False Statements:
- The value of df for a chi-square test does not rely on sample size (n). (T/F)
- A positive chi-square statistic indicates a positive correlation between the two variables. (T/F)

Learning Check 4 – Answers

True: The df value remains contingent solely on the count of rows and columns in the observation matrix.
False: Chi-square cannot yield negative numbers and thus cannot accurately reflect correlation type between variables.

Example of SPSS Output for Chi-Square Test for Independence

Crosstabulation Example:
- VAR00002 * VAR00003 counts presented in a 2x2 matrix format, detailing observed values across categories.
Chi-Square Test Values:
- Results indicate statistical significance in relation to null hypothesis with provided p-values and the expectation that at least 5 counts appear in every cell.

Conclusion

The chapter offers extensive insights into chi-square tests used within statistical analysis frameworks and critically engages students to understand application logistics including hypotheses, calculations, and interpretative reporting.