Chapter 11: Inference for Categorical Data: Chi-Squared Tests
In this chapter, we will look at inference for categorical variables.
Example:
These are the approximate percentages for the different blood types among people with blue eyes: A: 40%; B: 11%; AB: 4%; O: 45%.
A random sample of 1000 people with brown eyes yielded the following blood type data:
A: 270; B: 200; AB: 40; O: 490.
Does this sample provide evidence that the distribution of blood types among brown-eyed people differs from that of blue-eyed people, or could the sample values simply be due to sampling variation?
Chi Square goodness of fit test can be used to answer these questions.
In the Chi-square goodness-of-fit test, there is one categorical variable (here : blood type) and one population(here : brown eyed people).
Comparing observed and expected values of the data, we get the following table:
The numbers vary for types A and B but not for types AB and O.
The chi-square statistic ( X2 ) calculates the squared difference between the observed and expected values relative to the expected value for each category.
The X2 statistic is computed as follows:
The chi-square distribution is based on the number of degrees of freedom.
degrees of freedom = (c − 1) where c is the number of categories.
The essential parts of the test are summarized in the following table.
Now, assume
Pa = proportion of brown eyed people with type A blood.
Pb = proportion of brown eyed people with type B blood.
Pab = proportion of brown eyed people with type AB blood.
H0: Pa=0.4, Pb=0.11, Pab=0.45
Ha: Any of the above proportions is not as stated.
Using Chi-square Goodness-of-Fit test, expected values are as follows:
type A = 400
type B = 110
type AB = 40
type O = 450
Each is >5 and so the test is valid.
In Chi-square test for homogeneity of proportions we may also encounter a situation in which there is one categorical variable measured across two or more populations
Chi-square test for independence is one in which there are two categorical variables measured across a single population.
Two-Way Table or Contingency Table for categorical data is simply a rectangular array of cells.
Each cell contains the frequencies for the joint values of the row and column variables.
If the row variable has r values, then there will be r rows of data in the table.
If the column variable has c values, then there will be c columns of data in the table.
There are r × c cells in the table.
The marginal totals are the sums of the observations for each row and each column.
For a two-way table, the number of degrees of freedom is calculated as
====( number of rows – 1)( number of columns – 1) =====
( r − 1)( c − 1).
The Chi-Squared test for independence is summarized as follows:
In this method, we use chi-square statistic to investigate whether or not the values of a single categorical variable are proportional among two or more populations.
Example:
Let’s consider a situation in which a sample of 36 students is selected and then categorized according to gender and political party preference. We then asked if gender and party preference are independent in the population.
Now suppose we selected a random sample of 20 males from the population of males in the school and another, independent, random sample of 16 females from the population of females in the school. Within each sample we classify the students as Democrat, Republican, or Independent. The results are presented in the following table:
Here, we do not ask if gender and political party preference are independent. Instead, we ask if the proportions of Democrats, Republicans, and Independents are the same within the populations of Males and Females. This is the test for homogeneity of proportions.
Assume:
p1: proportion of Male Democrats
p2 : proportion of Female Democrats
p3: proportion of Male Republicans
p4: proportion of Female Republicans
p5: proportion of Independent Males
p6: proportion of Independent Females
Therefore:
H0 : p1=p2, p3=p4, p5=p6
HA : At least one of these proportions is not as specified.
Continue solving after this as followed in previous examples.
Click the link to go to the next chapter:
In this chapter, we will look at inference for categorical variables.
Example:
These are the approximate percentages for the different blood types among people with blue eyes: A: 40%; B: 11%; AB: 4%; O: 45%.
A random sample of 1000 people with brown eyes yielded the following blood type data:
A: 270; B: 200; AB: 40; O: 490.
Does this sample provide evidence that the distribution of blood types among brown-eyed people differs from that of blue-eyed people, or could the sample values simply be due to sampling variation?
Chi Square goodness of fit test can be used to answer these questions.
In the Chi-square goodness-of-fit test, there is one categorical variable (here : blood type) and one population(here : brown eyed people).
Comparing observed and expected values of the data, we get the following table:
The numbers vary for types A and B but not for types AB and O.
The chi-square statistic ( X2 ) calculates the squared difference between the observed and expected values relative to the expected value for each category.
The X2 statistic is computed as follows:
The chi-square distribution is based on the number of degrees of freedom.
degrees of freedom = (c − 1) where c is the number of categories.
The essential parts of the test are summarized in the following table.
Now, assume
Pa = proportion of brown eyed people with type A blood.
Pb = proportion of brown eyed people with type B blood.
Pab = proportion of brown eyed people with type AB blood.
H0: Pa=0.4, Pb=0.11, Pab=0.45
Ha: Any of the above proportions is not as stated.
Using Chi-square Goodness-of-Fit test, expected values are as follows:
type A = 400
type B = 110
type AB = 40
type O = 450
Each is >5 and so the test is valid.
In Chi-square test for homogeneity of proportions we may also encounter a situation in which there is one categorical variable measured across two or more populations
Chi-square test for independence is one in which there are two categorical variables measured across a single population.
Two-Way Table or Contingency Table for categorical data is simply a rectangular array of cells.
Each cell contains the frequencies for the joint values of the row and column variables.
If the row variable has r values, then there will be r rows of data in the table.
If the column variable has c values, then there will be c columns of data in the table.
There are r × c cells in the table.
The marginal totals are the sums of the observations for each row and each column.
For a two-way table, the number of degrees of freedom is calculated as
====( number of rows – 1)( number of columns – 1) =====
( r − 1)( c − 1).
The Chi-Squared test for independence is summarized as follows:
In this method, we use chi-square statistic to investigate whether or not the values of a single categorical variable are proportional among two or more populations.
Example:
Let’s consider a situation in which a sample of 36 students is selected and then categorized according to gender and political party preference. We then asked if gender and party preference are independent in the population.
Now suppose we selected a random sample of 20 males from the population of males in the school and another, independent, random sample of 16 females from the population of females in the school. Within each sample we classify the students as Democrat, Republican, or Independent. The results are presented in the following table:
Here, we do not ask if gender and political party preference are independent. Instead, we ask if the proportions of Democrats, Republicans, and Independents are the same within the populations of Males and Females. This is the test for homogeneity of proportions.
Assume:
p1: proportion of Male Democrats
p2 : proportion of Female Democrats
p3: proportion of Male Republicans
p4: proportion of Female Republicans
p5: proportion of Independent Males
p6: proportion of Independent Females
Therefore:
H0 : p1=p2, p3=p4, p5=p6
HA : At least one of these proportions is not as specified.
Continue solving after this as followed in previous examples.
Click the link to go to the next chapter: