Unit 8: Inference for Categorical Data: Chi-Square

0.0(0)
Studied by 33 people
0%Unit 8 Mastery
0%Exam Mastery
Build your Mastery score
multiple choiceAP Practice
Supplemental Materials
call kaiCall Kai
Card Sorting

1/49

Last updated 2:11 AM on 3/12/26
Name
Mastery
Learn
Test
Matching
Spaced
Call with Kai

No analytics yet

Send a link to your students to track their progress

50 Terms

1
New cards

Categorical data

Data in which each observation falls into a label or group (e.g., party, brand, color, yes/no) rather than a numerical measurement.

2
New cards

Categorical inference

Using sample counts/proportions to judge whether an observed categorical pattern is plausibly due to chance variation under a proposed model.

3
New cards

Counts

The number of observations in each category/cell; the primary summary measure used in chi-square procedures.

4
New cards

Proportions

Counts divided by the total sample size; used to describe categorical distributions but chi-square calculations require counts.

5
New cards

Observed count (O)

The actual sample count in a category (one-way table) or cell (two-way table).

6
New cards

Expected count (E)

The count predicted in a category/cell if the null hypothesis model were true.

7
New cards

Null hypothesis (H0) in chi-square

A “blueprint” model specifying how counts should fall into categories in the long run (distribution, equal distributions across groups, or independence).

8
New cards

Alternative hypothesis (Ha) in chi-square

The claim that the observed categorical pattern does not match the null model (at least one proportion differs / distributions differ / variables are associated).

9
New cards

Goodness-of-fit test

A chi-square test that checks whether one categorical variable follows a claimed distribution of proportions in a population.

10
New cards

Test for homogeneity

A chi-square test that compares the distribution of one categorical response variable across two or more populations/treatments (based on separate samples or random assignment).

11
New cards

Test for independence

A chi-square test that evaluates whether two categorical variables are associated within one population (based on one sample measuring both variables).

12
New cards

One-way table

A table of counts for a single categorical variable across k categories (used in goodness-of-fit).

13
New cards

Two-way table

A table of counts for two categorical variables arranged in r rows and c columns (used in homogeneity or independence).

14
New cards

Margins (row/column totals)

The totals for each row and each column in a two-way table, used to compute expected counts.

15
New cards

Grand total (n)

The total number of observations in the table; used in expected-count formulas.

16
New cards

Chi-square distribution

A family of right-skewed distributions on nonnegative values used to model chi-square test statistics when H0H_0 is true (approximately).

17
New cards

Right-skewed

A distribution shape with a long tail to the right; chi-square distributions are always right-skewed (less skewed as df increases).

18
New cards

Degrees of freedom (df)

Using a chi-square distribution to approximate the sampling distribution of χ2\chi^2 under H0H_0; works better when expected counts are sufficiently large.

19
New cards

df for goodness-of-fit

For kk categories in a one-way table, df=k1.df = k - 1.

20
New cards

df for two-way tables

For an r×cr \times c table, df=(r1)(c1).df = (r - 1)(c - 1).

21
New cards

Chi-square test statistic (χ²)

A measure of overall mismatch between observed and expected counts: χ2=(OE)2/E.\chi^2 = \sum (O - E)^2/E.

22
New cards

Cell contribution to χ²

The amount a single category/cell adds to χ2\chi^2: (OE)2/E.(O - E)^2/E.

23
New cards

Standardized residual

A directional diagnostic for a cell: (OE)/E(O - E)/\sqrt{E}; positive means observed >> expected, negative means observed << expected.

24
New cards

Pearson residual

Another name often used for standardized residuals in chi-square output: (O − E)/√E.

25
New cards

Right-tailed test (chi-square)

Chi-square p-values come from the right tail because only large χ² values indicate strong evidence against H0.

26
New cards

P-value (chi-square)

The probability, assuming H0H_0 is true, of getting a χ2\chi^2 statistic at least as large as the one computed.

27
New cards

Significance level (α)

The cutoff probability used to decide whether to reject H0H_0 (e.g., 0.05).

28
New cards

Reject H0

Decision made when p-value<αp\text{-value} < \alpha; conclude the data provide evidence against the null categorical model.

29
New cards

Fail to reject H0

Decision made when p-value >α> \alpha; conclude there is not convincing evidence against the null model (not proof H0H_0 is true).

30
New cards

Statistical significance

A result is statistically significant when the p-value is small enough to reject H0, indicating the pattern is unlikely under the null model.

31
New cards

Practical significance

Whether the size/impact of the observed differences matters in context; can differ from statistical significance, especially with large samples.

32
New cards

Chi-square approximation

Using a chi-square distribution to approximate the sampling distribution of χ² under H0; works better when expected counts are sufficiently large.

33
New cards

Large expected counts condition

A requirement for reliable chi-square inference; common AP rule of thumb: all expected counts are at least 5.

34
New cards

Random condition

A chi-square inference condition requiring data from a random sample (or random assignment in an experiment) to support broader inference.

35
New cards

Independence condition

A chi-square inference condition requiring observations to be independent; often supported by sampling design and the 10% condition.

36
New cards

10% condition

When sampling without replacement, the sample size should be less than 10% of the population to help justify independence.

37
New cards

One person, one cell rule

Each individual should contribute to exactly one cell in the table; violations (e.g., repeated measures) break independence and can mislead p-values.

38
New cards

Pooling (combining categories)

Combining categories to increase expected counts when some are too small; must be logically defensible and requires recomputing E and df.

39
New cards

Sparse table

A two-way table with many small expected counts (often from too many categories), which can make chi-square approximations unreliable.

40
New cards

Follow-up analysis (after significant χ²)

Additional work (e.g., residuals or conditional proportions) to determine which cells/categories drive the significant result.

41
New cards

Conditional distribution

Proportions of one variable within each category of another variable (e.g., sleep distribution within each screen-time group) used to describe associations.

42
New cards

Association (in chi-square independence)

A relationship where the distribution of one categorical variable differs across levels of another; evidence occurs when independence is rejected.

43
New cards

Independence (probability statement)

A and B are independent if P(AB)=P(A)P(B);P(A \cap B) = P(A)P(B); in tables, it implies similar conditional distributions.

44
New cards

Expected count in goodness-of-fit

For category ii with claimed proportion pip_i in sample size n,n, Ei=npi.E_i = n \cdot p_i.

45
New cards

Expected count in two-way tables

For cell (i,j), Eij=(row totali×column totalj)nE_{ij} = \frac{(\text{row total}_i \times \text{column total}_j)}{n} under H0H_0 for homogeneity or independence.

46
New cards

Critical-value approach

A method that compares the computed χ2\chi^2 statistic to a cutoff from the chi-square distribution (with given dfdf) instead of relying only on a p-value.

47
New cards

Scope of inference

What conclusions are justified based on design: random sampling supports generalization; random assignment supports causation.

48
New cards

Random assignment

Assigning individuals to treatments by chance; allows causal conclusions about how treatment affects a categorical response distribution.

49
New cards

Observational study

A study with no random assignment; can show association but does not justify causal conclusions.

50
New cards

AP-style conclusion

A conclusion that states reject/fail to reject, references p-value vs α, describes the claim in context (variables/population), and respects scope (no unwarranted causation).