1/34
Looks like no tags are added yet.
Name | Mastery | Learn | Test | Matching | Spaced | Call with Kai |
|---|
No study sessions yet.
Two-way table - definition
A table that classifies individuals according to two categorical variables, with rows for one variable's categories and columns for the other's categories.
Row variable - definition
The categorical variable whose categories form the rows of a two-way table (for example, graduation status: graduated vs did not graduate).
Column variable - definition
The categorical variable whose categories form the columns of a two-way table (for example, race/ethnicity).
Marginal totals - definition
The "Total" row and "Total" column that show the overall distributions of each variable separately, combining over the other variable's categories.
Using percents in two-way tables
It is often clearer to convert cell counts to percentages when comparing groups so that patterns in the relationship between variables are easier to see.
Describing relationships in two-way tables
To describe an association, compute relevant percents (such as the percent graduating within each race) and compare them across rows or columns.
Graduation and race example - pattern
In the graduation-by-race table, over 60% of white students and more than 70% of Asian students graduated in 6 years, but less than 40% of Black and American Indian/Alaska Native students did.
Question of inference for two-way tables
When a sample table shows an association, we ask whether this reflects a real association in the population or could be due to random sampling variation.
Cocaine treatment example - setup
In a randomized study, 72 cocaine addicts were assigned equally to three treatments (desipramine, lithium, placebo), and success was defined as not using cocaine.
Cocaine treatment example - observed pattern
The proportion of subjects who did not use cocaine was much higher in the desipramine group than in the lithium or placebo groups.
Null hypothesis for a two-way table
The null hypothesis states there is no association between the row and column variables; any differences in sample counts are due to chance alone.
Null hypothesis - cocaine study
H0: There is no association between the treatment an addict receives and whether or not there is success in not using cocaine in the population of all cocaine addicts.
Alternative hypothesis for a two-way table
The alternative hypothesis states there is an association between the row and column variables; the distribution of one variable differs across levels of the other.
Alternative hypothesis - cocaine study
Ha: There is an association between the treatment an addict receives and whether or not there is success in not using cocaine in the population of all cocaine addicts.
Expected counts - definition
The counts we would expect in each cell of the two-way table if the null hypothesis of no association were true, allowing for random variation.
Expected counts - equal-group cocaine example
If the overall success rate is 24/72 = 1/3 and each treatment group has 24 subjects, we expect 8 successes and 16 failures in each treatment group under H0.
General idea of the chi-square test
To test H0, compare observed cell counts with expected counts; large overall discrepancies provide evidence against "no association."
Chi-square statistic - concept
A single number that measures how far the observed counts in all cells are from their expected counts, combining squared differences over all cells.
Chi-square distribution - definition
The sampling distribution of the chi-square statistic when H0 is true; it takes only nonnegative values and is skewed to the right.
Degrees of freedom for chi-square
For a two-way table with r rows and c columns, the chi-square test uses a chi-square distribution with (r − 1)(c − 1) degrees of freedom.
Degrees of freedom - cocaine example
The cocaine table has 3 treatments and 2 outcomes, so df = (3 − 1)(2 − 1) = 2.
Using chi-square critical values
Tables give critical values showing how large the chi-square statistic must be (for a given df) to be significant at levels such as 0.05 or 0.01.
Cocaine study - chi-square result
With df = 2 and χ² = 10.5, the statistic exceeds the 0.01 critical value (9.21), so the association between treatment and success is significant at P < 0.01.
Interpreting significant chi-square
The test shows strong evidence of some association; to see the nature of the relationship, look back at the table (desipramine performs better than the other treatments).
Conditions for using the chi-square test
You can safely use the chi-square test when no more than 20% of expected counts are less than 5 and all expected counts are at least 1.
Chi-square test - what it tells us
The chi-square test tells whether an observed association is statistically significant, not whether it is large or practically important.
Simpson's paradox - idea
An association that holds within each of several groups can disappear or reverse when the data from all groups are combined into a single table.
Medical helicopter example - overall pattern
Overall, 32% of helicopter patients died versus 24% of road-transport patients, suggesting helicopters are worse when seriousness of accidents is ignored.
Medical helicopter example - within groups
When data are broken down by seriousness of accident, the death rate is lower for helicopter patients in both serious and non-serious accidents.
Lurking variable in helicopter example
Seriousness of the accident is a lurking variable; helicopters are used more often for serious accidents, so combining all patients without this variable is misleading.
Simpson's paradox - definition
When an association or comparison that holds within each of several groups reverses or disappears when the groups are combined, this is called Simpson's paradox.
Lurking variables and categorical data
As with quantitative data, lurking variables can change or reverse observed associations between categorical variables in two-way tables.
Statistics in summary - two-way tables
Categorical variables group individuals into classes; to display the relationship between two categorical variables, use a two-way table and compare appropriate percentages.
Statistics in summary - Simpson's paradox
Lurking variables can make an observed association misleading; Simpson's paradox is an extreme case where combining groups reverses the association.
Statistics in summary - chi-square test
The chi-square test compares observed and expected counts in a two-way table and uses the chi-square distribution to decide whether an observed association is statistically significant.