Comparing Counts
Chi-square model
chi-square models are skewed to the right. They are parameterized by their degrees of freedom and as the degrees of freedom increase, they become less skewed.
Cell
A cell is one element of a table corresponding to a specific row and a specific column. Table cells can hold counts, percentages, or measurements on other variables. Or they can hold several values
Chi-square statistic
The chi-square statistic can be used to test whether the observed counts in a frequency distribution or contingency table match the counts we would expect according to some model. It is calculated as
Chi^2 = Sum of ((obs - exp)^2) / exp
chi-square statistics differ in how expected counts are found, depending on the question asked
Chi-square test of goodness-of-fit (GOF)
A test whether the distribution of counts in one categorical variable matches that of the distribution predicted by a model is called a test of GOF. In a chi-sqr GOF test, the expected counts come from the predicting model. The test finds the P-value from a chi0sqr model with n-1 degrees of freedom where n is the number of categories in the categorical variable
Chi-sqr test of homogeneity
A test comparing the distribution of counts for two or more groups on the same categorical variable is called a test of homogeneity. A chi-square test of homogeneity finds expected counts based on the overall frequencies, adjusted for the totals in each group under the (null hypothesis) assumption that the distributions are the same for each group. We find the P-value from the chi-sqr distribution wit (#Rows -1) x (#olums -1) degrees of freedom, where # Rows gives the number of categories and # Column gives the number of independent groups.
Chi-sqr test for independence
A test of whether two categorical variables are independent examines the distribution of counts for one group of individuals classified according to both variables, A chi sqr test of independence finds expected counts by assuming that knowing the marginal totals tell us the cell frequencies, assuming that there is no association between the variables. This turns out to be the same calculation as a test of homogeneity. We find a P-value from a chi-sqr distribution with (Rows - 1) x (#Cols - 1) degrees of freedom, where # Rows fives the number of categories in one variable and # colums gives the number of categories in the other
Chi-sqr component
the components of a chi-sqr cal are found for each cell of the table
(obs -exp)^2 /exp
Standardized residual
In each cell of a two-way table, a standardized residual is the sqr root of the chi-sqr component for the cell with the sign of the observed - expected difference:
(obs-exp)/(root) exp
when we reject a chi-sqr test, an explanation of the standardized residual can sometimes reveal more about how the data deviate from the null model
Two-way table
Each cell of a two-way table shows counts of individuals. One way classifies a sample according to a categorical variable. the other way can classify different groups of individuals according to a categorical variable. The other way can classify different groups of individuals according to the same variable or classify the same individuals according to a different categorical variable
Contingency table
A two way table that classifies individuals according to two categorical variables is called a contingency table.