ANOVA & Multiple-Group Comparisons – Detailed Study Sheet
ANOVA: Core Purpose and Context
- Designed for >2 categorical groups when the dependent variable is continuous
- Extends the two–sample t–test logic to multiple groups without inflating type I error
- Full name: “Analysis of Variance (ANOVA)”
- Central idea: partition the total variability in the data set into
- Treatment (among–group) variability
- Background/within–group (error) variability
- Key advantage over running many t–tests
- Multiple t–tests raise the family-wise type I error (e.g., 6 tests at 5\% each ≈ 30\% chance of at least one false positive)
- ANOVA + post-hoc tests control this inflation
Fundamental Hypotheses
- Null: \text{H}0: \mu1 = \mu2 = \mu3 = \dots = \mu_n (all group means equal)
- Alternative: \text{H}_A: at least one mean differs
- Important: ANOVA tells if a difference exists, not where; post-hoc tests are required for pairwise insight
Core Test Statistic
- Ratio of mean squares
F = \dfrac{\text{MS}{\text{between}}}{\text{MS}{\text{within}}} - Decision rule: compare calculated F to critical F{\alpha,df1,df_2} or use p-value
Assumptions Checklist (Classical One-Way ANOVA)
- Independence: samples are random & independent
- Homogeneity of variance (equal variances among groups)
- Tested with Levene’s, Bartlett’s, or in R:
leveneTest()
- Normality of residuals
- Residuals = observed value – group mean
- Combine residuals from all groups → single vector centred at 0
- Shapiro–Wilk, Kolmogorov–Smirnov, QQ-plot, histogram Density curve
Visual & Code Aids (R snippets)
- Boxplot for variance impression:
boxplot(Calcium ~ Diagnosis, las = 1, col = c(2,4,8)) - Levene test:
leveneTest(Calcium ~ Diagnosis) → F_{2,15}=1.63,\;p=0.229 (fail to reject equal variances) - Fit model & extract residuals:
model <- aov(Calcium ~ Diagnosis)
modelresid <- residuals(model)
shapiro.test(modelresid) # W = 0.971, p = 0.815
- Histogram + density:
hist(...); lines(density(...))
Why Not Just Learn The "Math By Hand"?
- Instructor note: mechanics still matter but conceptual grasp & software execution are now priority
Example 1: Calcium Intake vs Bone Density Category
- Groups: Normal, Osteopenia, Osteoporosis
- Sample sizes not explicit, but total df_{\text{within}} = 15 (⇒ 18 patients)
- Assumptions met (random, equal variance, normal residuals)
- ANOVA result: F{2,15}=1.4,\;p>0.27 → Fail to reject \text{H}0
- Means ± SD: Normal 0.94\pm0.16\,g, Osteopenia 0.80\pm0.22\,g, Osteoporosis 0.72\pm0.30\,g
- Conclusion: No significant difference in calcium intake across bone-density groups
Example 2: Long-Tail Widowbirds (Andersson 1982)
- Biological premise: sexual selection may favor males with longer tails → more nests (mates)
- Design (36 males):
- Shortened tails (−14 cm)
- Control / Average tails (cut–glued same length)
- Elongated tails (~25 cm; +14 cm glued)
- Assumption tests
- Normality of residuals: W=0.94,\;p>0.07 (fail to reject normality)
- Homogeneity: F_{2,33}=0.38,\;p>0.68 (equal variances)
- ANOVA: F{2,33}=88.8,\;p0 (at least one mean differs)
- Post-hoc: Tukey’s HSD (controls family-wise error)
- Shortened vs Control: p>0.09 (not sig.)
- Shortened vs Elongated: p<0.00001 (sig.)
- Control vs Elongated: p<0.00001 (sig.)
- Interpretation: Elongated males ( 3.38\pm0.53 nests ) sire significantly more nests than other groups; supports sexual-selection hypothesis
Post-Hoc Testing Logic & Options
- Tukey’s Honestly Significant Difference (HSD)
- Uses a critical difference value; pairwise means differing by >HSD are “honestly” significant
- Essentially a t-test adjusted for the family-wise error rate
- When assumptions fail:
- Welch’s ANOVA (variance heterogeneity)
- Post-hoc: Games–Howell
- Kruskal–Wallis (non-normal residuals / ordinal data)
- Tests medians, distribution-free
- Post-hoc: Dunn’s test + Bonferroni correction
- Family-wise error rate (FWER): probability of ≥1 type I error across a family of comparisons
- Example: six t–tests at \alpha=0.05 each → 1-(1-0.05)^6 ≈ 0.265 ≈ 26.5\% (rounded to 30\% in slide)
- Levene’s statistic (median centred version): F = \dfrac{(N-k)}{(k-1)}\;\dfrac{\sumk nk (|Y{k\,i}-\tilde{Y}k| - |Y{..}-\tilde{Y}{..}|)^2}{\sum{k,i} (|Y{k\,i}-\tilde{Y}k| - |Y{..}-\tilde{Y}_{..}|)^2}
- Shapiro–Wilk statistic W compares ordered residuals to expected normal order statistics
Additional Applied Scenarios (Slides 15-18)
- Surgical Closure Recovery Time (150 appendectomy patients)
- Treatments: Staples, Sutures, Glue, Strips, Intradermal sutures
- Objective: detect \Delta in days to recovery across 5 methods (would later require 10 pairwise tests if naïve)
- Evolution of Severe Skin Lesions (Staphylococcus aureus strains)
- 5 annual strains; dependent: % lesion coverage. Identify first mutation year causing increased severity.
- Study Technique Effectiveness
- Modes: Reading, Writing, Listening, Watching. DV: test grade.
- Real-world note: DISC learning-style claims largely unsupported; test with ANOVA.
- Zone of Inhibition (Old-school Microbiology)
- Antibiotics: Penicillin, Tetracycline, Ciprofloxacin vs K. pneumoniae. Wider zone ⇒ more susceptible.
- Elephant Stress (African reserves)
- fGCM levels across Kruger (heavy tourism), Serengeti (moderate), Etosha (low)
- Root Growth & Fertilizer (8 treatments)
1 Compost, 2 Manure, 3 Bone Meal, 4 Fish Emulsion, 5 Ammonium Nitrate, 6 Triple Superphosphate, 7 Potassium Sulfate, 8 Hippie Mix
- DV: root growth rate (cm)
Decision Tree for 1–Way Comparison of >2 Groups
- Assumptions satisfied → Classical ANOVA → p<0.05? Tukey HSD
- Variances unequal (normal residuals) → Welch’s ANOVA → p<0.05? Games–Howell
- Residuals non-normal → Kruskal–Wallis → p<0.05? Dunn + Bonferroni
Conceptual & Ethical Notes
- Sampling integrity & independence are paramount; no statistical fix salvages systematic bias
- Multiple comparison corrections (Tukey, Bonferroni, etc.) ethically reduce false discovery claims
- Biological examples (e.g., tail manipulation) raise welfare considerations; justify with scientific value
- Industry cases (surgical techniques, fertilizers) carry direct patient / consumer impact → robust inference critical
Practical Reminders for Exam & Real-World Use
- State and test assumptions explicitly before running ANOVA
- Report df, test statistic, and p-value: F{df1,df_2} format
- Provide means ± SD (or medians + IQR for non-parametric) & group sample sizes
- Always follow significant omnibus F with an appropriate post-hoc analysis
- Tie interpretations back to biological or practical relevance; statistics inform but do not replace domain reasoning