Unit 8 Inference for Categorical Data: Understanding and Using Chi-Square Procedures

0.0(0)

Studied by 21 people

0%Unit 8 Mastery

0%Exam Mastery

View linked note

Build your Mastery score

AP Practice

Supplemental Materials

Call Kai

Card Sorting

1/24

Earn XP

Description and Tags

AP Statistics

Unit 8: Inference for Categorical Data: Chi-Square

Chi-Square Tests

Last updated 3:08 PM on 3/12/26

Name	Mastery	Learn	Test	Matching	Spaced	Call with Kai

No analytics yet

Send a link to your students to track their progress

25 Terms

New cards

Chi-square goodness of fit test (GOF test)

A hypothesis test for one categorical variable that compares observed counts to expected counts from a claimed distribution.

New cards

Observed count (O)

The number of sample observations that fall in a particular category (or cell) of a table.

New cards

Expected count in GOF ( $E_i = n·p_{i,0}$ )

The count predicted by the null model for category i, found by multiplying the sample size n by the claimed proportion $p_{i,0}$ .

New cards

Chi-square test statistic ( $\chi^2$ )

A measure of overall discrepancy between observed and expected counts: $\chi^2 = \sum (O - E)^2 / E$ .

New cards

Chi-square contribution

A single term (O − E)² / E showing how much one category/cell contributes to the total χ²; larger values indicate bigger disagreement with the null model.

New cards

Degrees of freedom for GOF (df = k − 1)

For a goodness of fit test with k categories, df equals the number of categories minus 1.

New cards

Right-tailed chi-square test

A chi-square test where only large χ² values support the alternative, because large χ² indicates large overall mismatch between O and E.

New cards

Random condition (chi-square tests)

The data should come from a random sample (or random assignment in an experiment) to justify inference.

New cards

Independence condition (observations)

Sample observations must be independent of one another; this is about people/outcomes not influencing each other, not about variables being independent.

New cards

10% condition

When sampling without replacement, the sample size should be no more than about 10% of the population to support independence.

New cards

Large expected counts condition

All expected counts should be at least 5 (E ≥ 5 in every category/cell) so the chi-square approximation is valid.

New cards

Combine categories

A remedy when some expected counts are too small (E < 5): merge categories in a contextually sensible way to increase expected counts.

New cards

Fail to reject H₀

A decision meaning the sample data do not provide convincing evidence against the null; it does not prove the null model is true.

New cards

Chi-square test for homogeneity

A chi-square procedure comparing the distribution of one categorical response across two or more populations or treatments using separate samples (or treatment groups).

New cards

Two-way table (contingency table)

A table of counts classified by two categorical variables (or by group and response categories), used in homogeneity and independence tests.

New cards

Expected count in a two-way table (E = row total·column total / grand total)

The count predicted under the null (same distribution across groups or independence), computed from marginal totals.

New cards

Degrees of freedom for homogeneity/independence ( $df = (r - 1)(c - 1)$ )

For an r×c contingency table, $df = (r - 1)(c - 1)$ equals (number of rows − 1) times (number of columns − 1).

New cards

Chi-square test for independence

A chi-square test using one sample where two categorical variables are measured on each individual to assess whether the variables are associated.

New cards

Association (categorical variables)

A relationship where the distribution of one categorical variable differs across the categories of another (i.e., variables are not independent).

New cards

Standardized residual (( $O - E$ )/ $\sqrt{E}$ )

A cell-by-cell measure of deviation; large positive means more observed than expected, large negative means fewer observed than expected.

New cards

Marginal totals (row/column totals)

Totals across rows and columns in a two-way table; used to compute expected counts under homogeneity/independence.

New cards

Chi-square vs two-proportion z link ( $\chi^2 = z^2$ in 2×2)

For equivalent 2×2 setups, the chi-square test and the two-proportion z test are mathematically related by $\chi^2 = z^2$ and usually lead to the same significance conclusion.

New cards

Procedure selection: homogeneity vs independence

Homogeneity uses multiple samples/populations (one response variable); independence uses one sample with two variables measured on each subject.

New cards

Counts vs percentages (common mistake)

The $\chi^2$ formula must use counts for O and E; using proportions/percentages directly in the statistic is incorrect.

New cards

Chi-square p-value

The probability, under the null model, of getting a chi-square statistic at least as large as the observed χ² (area to the right under the chi-square curve).