Lecture 13: Chi-Square

0.0(0)

Studied by 0 people

Call Kai

Learn

Practice Test

Spaced Repetition

Match

Flashcards

Knowt Play

Card Sorting

1/25

There's no tags or description

Looks like no tags are added yet.

Last updated 1:33 AM on 5/2/26

Name	Mastery	Learn	Test	Matching	Spaced	Call with Kai

No analytics yet

Send a link to your students to track their progress

26 Terms

New cards

Chi-square test

Statistical test used to analyze categorical data (counts) by comparing what you observe to what you would expect under some assumption

GOF test: does one variable follow a specific distribution?
Independence test: are two categorical variables related?

New cards

What is the difference between a chi-square test and a t-test?

Chi-square test

Uses categorical data (counts/frequencies)
Looks at patterns in proportions

T-test

Uses continuous data (means)
Compares average values between groups

New cards

What is the chi-square distribution and how does it differ from the normal distribution?

Shape is right-skewed

Values are always ≥0
Distribution changes shape depending on df

New cards

How does the chi-square distribution change as a function of degrees of freedom / sample size?

As df increases:

Distribution becomes less skewed
Starts to look more symmetrical
The peak shifts to the right

New cards

Can a chi-square test be one-sided and/or two-sided?

Always one-sided (right-skewed)

New cards

What kind of data must we have in order to conduct a chi-square test?

Categorical variables
Data in the form of frequencies/counts
Independent observations
Setup is usually a contingency table (ex: 2x3)

New cards

What is a chi-square goodness-of-fit test? What type of question does it test?

examines whether the observed distribution of a single categorical variable matches a theoretical or expected distribution

New cards

What do the null and alternative hypotheses look like for GOF?

Ho: The observed frequencies match the expected frequencies

Ha: The observed frequencies do NOT match the expected frequencies

New cards

How do we compute a chi-square test statistic?

Numerator: squared difference between Observed (O) and Expected (E) → (O - E)²

Denominator: expected frequency (E), which scales the difference

A sum of standardized squared deviations between observed and expected counts

New cards

What happens to chi-square when the difference between the observed frequencies and the expected frequencies (as specified under the null) increases (assuming all other things stay equal)?

If (O - E) gets larger → the numerator increases → chi-square increases → more evidence against Ho

New cards

What happens to the chi-square value when the sample size increases (assuming all other things stay equal)?

Expected counts (E) increases, even small proportional differences can produce larger chi-square values
Larger samples make it easier to detect significant differences

New cards

What do we compare the chi-square statistic to?

A critical value from the chi-square distribution OR use a p-value

New cards

What are the assumptions of a GOF test?

Categorical data
Independent observations
Expected frequencies are sufficiently large

New cards

How do we compute degrees of freedom for GOF?

df = k - 1

k: number of categories

New cards

R output for a GOF test

Refers to one variable only
No mention of rows/columns
Hypothesis is about distribution matching expected proportions

<ul><li><p><span style="background-color: transparent;">Refers to one variable only</span></p></li><li><p><span style="background-color: transparent;">No mention of rows/columns</span></p></li><li><p><span style="background-color: transparent;">Hypothesis is about distribution matching expected proportions</span></p></li></ul><p></p>

New cards

chi-square test of independence

Examine whether two categorical variables are related or if they’re independent of each other

Tests questions like: “is gender related to political preference?”

New cards

Null and alternative hypotheses (chi-square test of independence)

Ho: The two variables are independent (no relationship exists)

Ha: The two variables are not independent (there’s an association)

New cards

How do we compute a chi-square test statistic?

Numerator: squared difference between observed and expected counts → (O - E)²

Denominator: expected counts (which standardizes the difference) → E

New cards

What happens to chi when the difference between the observed frequencies and the expected frequencies (as specified under the null) increases (assuming all other things stay equal)?

If (O - E) increases:

The numerator increases
So chi-square increases

New cards

What happens to the chi-square value when the sample size increases (assuming all other things stay equal)?

Expected frequencies increase
Even small proportional differences can produce larger chi-square values

New cards

What do we compare the chi-square statistic to?

A critical value from the chi-square distribution OR a p-value

New cards

What are the assumptions of a chi-sq test of independence?

Categorical variables
Independent observations (no repeated measures in the same cell)
Expected cell frequencies are sufficiently large
- Rule of thumb: each expected count ≥ 5
Data are in a contingency table

New cards

How do we compute degrees of freedom for this test?

df = (r - 1)(c - 1)

New cards

What is Yate’s correction? When is it used?

An adjustment applied to a chi-square test to make it more accurate when working with small samples and discrete data

Chi-square test uses a continuous distribution to approximate results from discrete count data
Yates’ correction compensates for this mismatch by slightly shrinking the difference between observed and expected values before squaring it (by 0.5)
It reduces the chi-square value, increases p-value, and makes the test more conservative (harder to reject Ho)
Primarily used for a test of independence (2x2 tables)

New cards

R output for chi-square test of independence with Yates’ continuity correction

Label becomes: "Pearson's Chi-squared test with Yates’ continuity correction”
Chi-squared value is smaller
P-value is larger (more conservative)
Mainly used for 2x2 tables
Yates’ correction: adjusts for the fact that chi is a continuous approximation but data are discrete counts

<ul><li><p><span style="background-color: transparent;">Label becomes: "Pearson's Chi-squared test with Yates’ continuity correction”</span></p></li><li><p><span style="background-color: transparent;">Chi-squared value is smaller</span></p></li><li><p><span style="background-color: transparent;">P-value is larger (more conservative)</span></p></li><li><p><span style="background-color: transparent;">Mainly used for 2x2 tables</span></p></li><li><p><span style="background-color: transparent;">Yates’ correction: adjusts for the fact that chi is a continuous approximation but data are discrete counts</span></p></li></ul><p></p>

New cards

R output for chi-square test of independence w/o Yates’ continuity correction

Based on a contingency table (2+ variables)
You can extract expected counts ($expected) and residuals ($residuals)
df = (r-1)(c-1)
Tests association

<ul><li><p><span style="background-color: transparent;">Based on a contingency table (2+ variables)</span></p></li><li><p><span style="background-color: transparent;">You can extract expected counts ($expected) and residuals ($residuals)</span></p></li><li><p><span style="background-color: transparent;">df = (r-1)(c-1)</span></p></li><li><p><span style="background-color: transparent;">Tests association</span></p></li></ul><p></p>