Chi-Square Test for Independence and Association

General Definition: This test is conducted to determine if there is a relationship between two categorical variables from a single population.
Key Distinctions: Unlike the Chi-Square Test for Homogeneity (which compares one variable across separate samples), the Chi-Square Test for Independence involves one sample and asks about two variables.
Terminology: The terms "independence" and "association" may be used interchangeably when naming the test or stating hypotheses.
Data Structure: This test is always performed on data presented in a two-way table.

Null Hypothesis ( $H_0$ ): States that there is no relationship between the variables. They are independent or not associated. * Example: "Taco Tongue and Evil Eyebrow are independent" or "There is no association between Taco Tongue and Evil Eyebrow."
Alternative Hypothesis ( $H_a$ ): States that there is a relationship between the variables. They are dependent or associated. * Example: "Taco Tongue and Evil Eyebrow are associated" or "The variables are dependent."
The Concept of Independence: If variables are independent, one does not affect the other. For instance, being able to perform a "taco tongue" (folding the tongue) has nothing to do with the ability to raise one eyebrow (evil eyebrow).

Calculation Method: To find the expected counts for a cell in a two-way table without using a technology matrix, use the following formula: $E = \frac{\text{Row Total} \times \text{Column Total}}{\text{Grand Total}}$
Numerical Example ( $n = 600$ ): * Total sample size (Grand Total): $600$ * Row totals: $480$ , $120$ * Column totals: $200$ , $400$ * Calculation 1: $\frac{480 \times 200}{600} = 160$ * Calculation 2: $\frac{480 \times 400}{600} = 320$ * Calculation 3: $\frac{120 \times 200}{600} = 40$ * Calculation 4: $\frac{120 \times 400}{600} = 80$
Significance of Expected Counts: These counts are used in multiple-choice questions on exams and are essential for checking conditions.

Random: Data must come from a random sample to generalize the findings to the population. * In the example: A random sample of $600$ seniors was taken to generalize to all seniors.
10% Condition: When sampling without replacement, the sample size ( $n$ ) must be less than $10\%$ of the population size ( $N$ ). * Condition check: 600 < 0.10 \times (\text{All Seniors}). * Critical Exception: Do not check the $10\%$ condition if the data comes from an experiment using random assignment. Checking it in this context will result in a loss of points on the exam.
Large Counts: All expected counts must be greater than or equal to $5$ . * In the example: The expected counts were $160$ , $320$ , $40$ , and $80$ . Since all values are $\ge 5$ , the condition is satisfied. * Strict Reporting Rule: It is mandatory to list the specific expected counts; simply stating they are all above $5$ is insufficient.

Chi-Square Test Statistic Formula: $\chi^2 = \sum \frac{(\text{Observed} - \text{Expected})^2}{\text{Expected}}$
Matrix Setup for Technology: * Matrix A (Observed Data): $\begin{pmatrix} 180 & 300 \ 20 & 100 \end{pmatrix}$ * Matrix B (Expected Data): $\begin{pmatrix} 160 & 320 \ 40 & 80 \end{pmatrix}$
Degrees of Freedom ( $df$ ): * Formula for two-way tables: $df = (r-1) \times (c-1)$ * Calculation: $(2-1) \times (2-1) = 1 \times 1 = 1$
Results for the Example Problem: * Chi-Square Test Statistic: $\chi^2 = 18.75$ * P-value: $P \approx 0$ * Significance Level: $\alpha = 0.05$ * Result interpretation: A very small P-value indicates that the observed counts are very different from the expected counts planned under the assumption of independence.

Decision: Since the P-value ( $P \approx 0$ ) is less than the alpha level ( $\alpha = 0.05$ ), we reject the null hypothesis ( $H_0$ ).
Conclusion Statement: We have convincing evidence that Taco Tongue and Evil Eyebrow are associated among seniors.

Type I Error: Occurs if we reject the null hypothesis when it is actually true. * The probability of a Type I error is equal to the significance level: $P(\text{Type I Error}) = \alpha = 0.05$ .
Type II Error: Occurs if we fail to reject the null hypothesis when it is actually false.
Relationship between Alpha, Type II Error, and Power: * As $\alpha$ (Type I Error probability) increases, the probability of a Type II error decreases. * As the probability of a Type II error decreases, the Power of the test increases. * Power and Alpha move in the same direction: If $\alpha$ increases, Power increases.
How to Increase Power: * Increase the sample size ( $n$ ). * Increase the significance level ( $\alpha$ ) (e.g., from $0.05$ to $0.15$ ). * Use a value in the alternative hypothesis that is further away from the null value.

Goodness of Fit (GOF): One sample, one variable; checks if the sample distribution matches a specific population distribution.
Homogeneity: Two or more samples (or groups), one variable; checks if the distribution of a single variable is the same across multiple populations.
Independence/Association: One sample, two variables; checks if there is a relationship between two variables within a single population.

Student Question: "Do you remember how to find expected counts on a two-way table?" * Response: Row total times column total over table total.
Student Question: "Is it homogeneity because we're looking at only one sample?" * Response: No, it is not homogeneity. If you have one sample with two variables, it's a test for independence/association. Homogeneity requires two or more samples.
Student Question: "What are the lines for? (referring to notations like 'ta' and 'ee')" * Response: Those are short-hand notations to be "lazy" while writing. Instead of writing "Taco Tongue" and "Evil Eyebrow" repeatedly, "TT" and "EE" are used.
Student Question: "Why are we checking 10%?" * Response: We check the 10% condition because we are sampling without replacement.
Student Question: "If it's an experiment, do we have to check 10%?" * Response: No. If it is an experiment with random assignment, do not check the 10% condition, or you will lose points.
Student Question: "How can you get decreased making in Type II error?" * Response: Increase the sample size or increase the alpha level (e.g., set alpha to $0.15$ ).