Lecture 10 chi Square Testing
Lecture 10: Chi Square Test
Overview
In Lecture 9, we discussed hypothesis testing for mean and proportion in quantitative data.
This lecture introduces the Chi Square test for qualitative data.
Applications of Chi Square Test
Testing relationships in categories:
Relationship between sex (Male or Female) and smoking habit (Smoker or non-smoker).
Education level (University, Secondary, Primary) and religion (Christian, Catholic, Buddhism).
Sex (Male, Female) and vegetarian tendency.
Goodness of Fit Test
A statistical test validating the null hypothesis that observed data follow a specific probability distribution.
Comparison of observed frequency to expected frequency.
Large differences indicate a significant violation of the null hypothesis.
Example: Testing the ratio of boys to girls among 2650 students in 2012 at a 1% significance level.
Observed Frequencies: Boys - 1600, Girls - 1050.
Null Hypothesis (H0): The ratio of boys and girls is the same.
Expected Frequencies: Boys = 1325, Girls = 1325.
Chi Square Statistic Calculation
The formula: [ \chi^2 = \sum \frac{(O_i - E_i)^2}{E_i} ]
Where:
(O_i) = observed frequency in group i.
(E_i) = expected frequency in group i.
The test statistic follows the Chi Square distribution with degrees of freedom (number of categories - 1).
Conclusion: If the calculated (\chi^2) > critical value, reject the null hypothesis.
Chi Square Test for Independence
Suitable for more than one category; uses a contingency table.
Example: Examining performance differences between boys and girls.
Contingency Table Data: Boys and Girls scores across test results A, B, C.
Expected Frequency Calculation
Consider total boys and girls:
Total Boys = 1098, Total Girls = 1552, Total = 2650.
Calculate expected results based on proportions:
Boys getting A: Expected = 600; Girls getting A: Expected = 600.
Complete calculation for all categories.
Null Hypotheses for independence:
H0: No difference between test result and sex.
H1: There is a difference.
Test Result Interpretation
Final Chi Square calculation shows that the null hypothesis is not rejected at the 5% significance level:
Calculated Chi Square value < Critical Chi Square value at both degrees of freedom.
Activities
Test Example: Relationship between smoking and sex using the data:
Observed results: Boys Smoker - 125, Girls Smoker - 56, Non-smoker Boys - 300, Non-smoker Girls - 519.
Perform a Chi-square test at a 5% significance level.
A-level Results Analysis: Is there significant difference among the results of:
Mathematics: Passed 324, Failed 447
Economics: Passed 81, Failed 211
Applied Mathematics: Passed 65, Failed 66
Test at 1% significance level.
Genetic Theory Ratio Analysis: Colour-strains ratio of plants:
Ratios expected: Yellow 1, Black 4, Blue 5 with total 200 plants.
Observed counts: Yellow 23, Black 86, Blue 91.
Test for significant differences at 5% level.
Smoking Habit and Age Group Test: 3200 interviews with results indicating smokers and non-smokers within specified age ranges.
Test for relationship at 1% significance.
Salary Relation with Education: Interviews show:
Master Degree and No Master Degree salary distributions and results.
Test relationship between education level and salary at 1% significance.