Chi-Square Test for Independence and Association
Chi-Square Test for Independence or Association
General Definition: This test is conducted to determine if there is a relationship between two categorical variables from a single population.
Key Distinctions: Unlike the Chi-Square Test for Homogeneity (which compares one variable across separate samples), the Chi-Square Test for Independence involves one sample and asks about two variables.
Terminology: The terms "independence" and "association" may be used interchangeably when naming the test or stating hypotheses.
Data Structure: This test is always performed on data presented in a two-way table.
Hypotheses and Variable Relationships
Null Hypothesis (): States that there is no relationship between the variables. They are independent or not associated. * Example: "Taco Tongue and Evil Eyebrow are independent" or "There is no association between Taco Tongue and Evil Eyebrow."
Alternative Hypothesis (): States that there is a relationship between the variables. They are dependent or associated. * Example: "Taco Tongue and Evil Eyebrow are associated" or "The variables are dependent."
The Concept of Independence: If variables are independent, one does not affect the other. For instance, being able to perform a "taco tongue" (folding the tongue) has nothing to do with the ability to raise one eyebrow (evil eyebrow).
Calculating Expected Counts
Calculation Method: To find the expected counts for a cell in a two-way table without using a technology matrix, use the following formula:
Numerical Example (): * Total sample size (Grand Total): * Row totals: , * Column totals: , * Calculation 1: * Calculation 2: * Calculation 3: * Calculation 4:
Significance of Expected Counts: These counts are used in multiple-choice questions on exams and are essential for checking conditions.
Conditions for Inference
Random: Data must come from a random sample to generalize the findings to the population. * In the example: A random sample of seniors was taken to generalize to all seniors.
10% Condition: When sampling without replacement, the sample size () must be less than of the population size (). * Condition check: 600 < 0.10 \times (\text{All Seniors}). * Critical Exception: Do not check the condition if the data comes from an experiment using random assignment. Checking it in this context will result in a loss of points on the exam.
Large Counts: All expected counts must be greater than or equal to . * In the example: The expected counts were , , , and . Since all values are , the condition is satisfied. * Strict Reporting Rule: It is mandatory to list the specific expected counts; simply stating they are all above is insufficient.
Calculations and Technology Output
Chi-Square Test Statistic Formula:
Matrix Setup for Technology: * Matrix A (Observed Data): * Matrix B (Expected Data):
Degrees of Freedom (): * Formula for two-way tables: * Calculation:
Results for the Example Problem: * Chi-Square Test Statistic: * P-value: * Significance Level: * Result interpretation: A very small P-value indicates that the observed counts are very different from the expected counts planned under the assumption of independence.
Decision and Conclusion
Decision: Since the P-value () is less than the alpha level (), we reject the null hypothesis ().
Conclusion Statement: We have convincing evidence that Taco Tongue and Evil Eyebrow are associated among seniors.
Error Types and Statistical Power
Type I Error: Occurs if we reject the null hypothesis when it is actually true. * The probability of a Type I error is equal to the significance level: .
Type II Error: Occurs if we fail to reject the null hypothesis when it is actually false.
Relationship between Alpha, Type II Error, and Power: * As (Type I Error probability) increases, the probability of a Type II error decreases. * As the probability of a Type II error decreases, the Power of the test increases. * Power and Alpha move in the same direction: If increases, Power increases.
How to Increase Power: * Increase the sample size (). * Increase the significance level () (e.g., from to ). * Use a value in the alternative hypothesis that is further away from the null value.
Comparison Summary: GOF vs. Homogeneity vs. Association
Goodness of Fit (GOF): One sample, one variable; checks if the sample distribution matches a specific population distribution.
Homogeneity: Two or more samples (or groups), one variable; checks if the distribution of a single variable is the same across multiple populations.
Independence/Association: One sample, two variables; checks if there is a relationship between two variables within a single population.
Questions & Discussion
Student Question: "Do you remember how to find expected counts on a two-way table?" * Response: Row total times column total over table total.
Student Question: "Is it homogeneity because we're looking at only one sample?" * Response: No, it is not homogeneity. If you have one sample with two variables, it's a test for independence/association. Homogeneity requires two or more samples.
Student Question: "What are the lines for? (referring to notations like 'ta' and 'ee')" * Response: Those are short-hand notations to be "lazy" while writing. Instead of writing "Taco Tongue" and "Evil Eyebrow" repeatedly, "TT" and "EE" are used.
Student Question: "Why are we checking 10%?" * Response: We check the 10% condition because we are sampling without replacement.
Student Question: "If it's an experiment, do we have to check 10%?" * Response: No. If it is an experiment with random assignment, do not check the 10% condition, or you will lose points.
Student Question: "How can you get decreased making in Type II error?" * Response: Increase the sample size or increase the alpha level (e.g., set alpha to ).