\
The sum of these weighted differences or discrepancies is called the chi-square statistic and is denoted as Ļ2 (Ļ is the lowercase Greek letter chi):
\
To decide how large a calculated Ļ2-value must be to be significant, that is, to choose a critical value, we must understand how Ļ2-values are distributed.
A Ļ2-distribution has only nonnegative values, is not symmetric, and is always skewed to the right.
There are distinct Ļ2-distributions, each with an associated number of degrees of freedom (df).
The larger the df value, the less pronounced is the skew, and the closer the Ļ2-distribution is to a normal distribution.
\
For inference about the distribution of a single categorical variable, such as a goodness-of-fit test, we will use a chi-square distribution with degrees of freedom, df = number of categories - 1.
\
A large city is divided into four distinct socioeconomic regions, one where the upper class lives, one for the middle class, one for the lower class, and one mixed-class region. Area percentages of the regions are 12%, 38%, 32%, and 18%, respectively. In a random sample of 55 liquor stores in the city, the numbers from each region are 4, 16, 26, and 9, respectively. The following shows how to determine ifĀ there is statistical evidence that region makes a difference with regard to numbers of liquor stores.
\
Solution:
(0.12)(55) = 6.6, (0.38)(55) = 20.9, (0.32)(55) = 17.6, and (0.18)(55) = 9.9, and we have:
First, state the hypotheses.
H0:Liquor stores are distributed over the four city regions in the same proportions as the areas of those regions, that is, in the percentagesĀ 12, 38, 32, and 18, respectively.
Ha:Liquor stores are not distributed over the four city regions in the same proportions as the areas of those regions (at least one proportion is not as specified in the null hypothesis).
Second, name the procedure and check the conditions:
Procedure:Ā A chi-square test for Goodness-of-fit.
Checks:
Third, calculate theĀ Ī§Ā²Ā statistic:
The P-value is P = P(Ļ2 > 6.264) = 0.099. [If n is the number of classes, df = n ā 1 = 3, and Ļ2cdf(6.264, 1000, 3) = 0.099.] We also note that putting the observed and expected numbers in Lists, calculator software (such as Ļ2GOF-Test on the TI-84 or on the Casio Prizm) quickly gives Ļ2 = 6.262 and P = 0.099.
Fourth, give a conclusion in context with linkage to the P-value:
\
With this large a P-value, 0.099 > 0.05, there is not sufficient evidence to reject H0. That is, there is not convincing evidence that liquor stores in this city are distributed over the four city regions (upper, middle, lower, and mixed class) in different proportions to the areas of those regions.
For example, we might consider several age groups and within each group ask how many employees show various levels of job satisfaction. The null hypothesis is that age and job satisfaction are independent, that is, that the proportion of employees expressing a given level of job satisfaction is the same no matter which age group is considered.
\
When testing for independence,
where df is the number of degrees of freedom, r is the number of rows, and c is the number of columns.
\
A growing number of states have legalized marijuana for medical or recreational purposes. In a nationwide telephone poll of 1000 randomly selected adults representing Democrats, Republicans, and Independents, respondents were asked two questions: their party affiliation and if they supported the legalization of marijuana. The answers, cross-classified by party affiliation, are given in the following two-way table (also called aĀ contingency table).
Test the null hypothesis that support for legalizing marijuana is independent of party affiliation. Use a 5% significance level.
\
Solution:
Mechanics: Putting the observed data into a "Matrix," calculator software (such as Ļ2-Test on the TI-84, Casio Prizm, or HP Prime) gives Ļ2 = 94.5 and P = 0.000, and stores the expected values in a second matrix:
Check conditions: We are given a random sample, n = 1000 is less than 10% of all adults, and we note that all expected cells are > 5.
Conclusion with linkage to the P-value: With this small of a P-value, 0.000 < 0.05, there is sufficient evidence to reject H0; that is, among all adults there is sufficient evidence of a relationship between party affiliation and support for legalizing marijuana.
\
\
\
In a large city, a group of AP Statistics students work together on a project to determine which group of school employees has the greatest proportion who are satisfied with their jobs. In independent simple random samples of 100 teachers, 60 administrators, 45 custodians, and 55 secretaries, the numbers satisfied with their jobs were found to be 82, 38, 34, and 36, respectively. Is there evidence that the proportion of employees satisfied with their jobs is different in different school system job categories?
Solution:
Hypotheses:
Procedure: A chi-square test for homogeneity.
Checks:
Mechanics: using Matrix andĀ Ļ2-Test on a calculator,Ā we find the expected cells, the test statisticĀ Ļ2, and the P-value.
The observed counts are as follows:
Putting the observed data into a Matrix, calculator software gives Ļ2Ā = 8.707 and P = 0.0335 and stores the expected values in a second matrix:
\
Conclusion in context with linkage to the P-value:
With this small of a P-value, 0.0335 < 0.05, there is sufficient evidence to reject H0; that is, there is convincing evidence that the true proportion of employees satisfied with their jobs is not the same across all the school system job categories.