ANOVA & Multiple-Group Comparisons – Detailed Study Sheet

Designed for >2 categorical groups when the dependent variable is continuous
- Extends the two–sample t–test logic to multiple groups without inflating type I error
Full name: “Analysis of Variance (ANOVA)”
Central idea: partition the total variability in the data set into
- Treatment (among–group) variability
- Background/within–group (error) variability
Key advantage over running many t–tests
- Multiple t–tests raise the family-wise type I error (e.g., 6 tests at 5\% each ≈ 30\% chance of at least one false positive)
- ANOVA + post-hoc tests control this inflation

Null: \text{H}0: \mu1 = \mu2 = \mu3 = \dots = \mu_n (all group means equal)
Alternative: \text{H}_A: at least one mean differs
Important: ANOVA tells if a difference exists, not where; post-hoc tests are required for pairwise insight

Ratio of mean squares
F = \dfrac{\text{MS}{\text{between}}}{\text{MS}{\text{within}}}
Decision rule: compare calculated F to critical F{\alpha,df1,df_2} or use p-value

Independence: samples are random & independent
Homogeneity of variance (equal variances among groups)
- Tested with Levene’s, Bartlett’s, or in R: leveneTest()
Normality of residuals
- Residuals = observed value – group mean
- Combine residuals from all groups → single vector centred at 0
- Shapiro–Wilk, Kolmogorov–Smirnov, QQ-plot, histogram Density curve

Boxplot for variance impression: boxplot(Calcium ~ Diagnosis, las = 1, col = c(2,4,8))
Levene test: leveneTest(Calcium ~ Diagnosis) → F_{2,15}=1.63,\;p=0.229 (fail to reject equal variances)
Fit model & extract residuals:

  model <- aov(Calcium ~ Diagnosis)
  modelresid <- residuals(model)
  shapiro.test(modelresid)   # W = 0.971, p = 0.815

Instructor note: mechanics still matter but conceptual grasp & software execution are now priority

Groups: Normal, Osteopenia, Osteoporosis
Sample sizes not explicit, but total df_{\text{within}} = 15 (⇒ 18 patients)
Assumptions met (random, equal variance, normal residuals)
ANOVA result: F{2,15}=1.4,\;p>0.27 → Fail to reject \text{H}0
- Means ± SD: Normal 0.94\pm0.16\,g, Osteopenia 0.80\pm0.22\,g, Osteoporosis 0.72\pm0.30\,g
- Conclusion: No significant difference in calcium intake across bone-density groups

Biological premise: sexual selection may favor males with longer tails → more nests (mates)
Design (36 males):
- Shortened tails (−14 cm)
- Control / Average tails (cut–glued same length)
- Elongated tails (~25 cm; +14 cm glued)
Assumption tests
- Normality of residuals: W=0.94,\;p>0.07 (fail to reject normality)
- Homogeneity: F_{2,33}=0.38,\;p>0.68 (equal variances)
ANOVA: F{2,33}=88.8,\;p0 (at least one mean differs)
Post-hoc: Tukey’s HSD (controls family-wise error)
- Shortened vs Control: p>0.09 (not sig.)
- Shortened vs Elongated: p<0.00001 (sig.)
- Control vs Elongated: p<0.00001 (sig.)
Interpretation: Elongated males ( 3.38\pm0.53 nests ) sire significantly more nests than other groups; supports sexual-selection hypothesis

Tukey’s Honestly Significant Difference (HSD)
- Uses a critical difference value; pairwise means differing by >HSD are “honestly” significant
- Essentially a t-test adjusted for the family-wise error rate
When assumptions fail:
- Welch’s ANOVA (variance heterogeneity)
- Post-hoc: Games–Howell
- Kruskal–Wallis (non-normal residuals / ordinal data)
- Tests medians, distribution-free
- Post-hoc: Dunn’s test + Bonferroni correction

Family-wise error rate (FWER): probability of ≥1 type I error across a family of comparisons
- Example: six t–tests at \alpha=0.05 each → 1-(1-0.05)^6 ≈ 0.265 ≈ 26.5\% (rounded to 30\% in slide)
Levene’s statistic (median centred version): F = \dfrac{(N-k)}{(k-1)}\;\dfrac{\sumk nk (|Y{k\,i}-\tilde{Y}k| - |Y{..}-\tilde{Y}{..}|)^2}{\sum{k,i} (|Y{k\,i}-\tilde{Y}k| - |Y{..}-\tilde{Y}_{..}|)^2}
Shapiro–Wilk statistic W compares ordered residuals to expected normal order statistics

Surgical Closure Recovery Time (150 appendectomy patients)
- Treatments: Staples, Sutures, Glue, Strips, Intradermal sutures
- Objective: detect \Delta in days to recovery across 5 methods (would later require 10 pairwise tests if naïve)
Evolution of Severe Skin Lesions (Staphylococcus aureus strains)
- 5 annual strains; dependent: % lesion coverage. Identify first mutation year causing increased severity.
Study Technique Effectiveness
- Modes: Reading, Writing, Listening, Watching. DV: test grade.
- Real-world note: DISC learning-style claims largely unsupported; test with ANOVA.
Zone of Inhibition (Old-school Microbiology)
- Antibiotics: Penicillin, Tetracycline, Ciprofloxacin vs K. pneumoniae. Wider zone ⇒ more susceptible.
Elephant Stress (African reserves)
- fGCM levels across Kruger (heavy tourism), Serengeti (moderate), Etosha (low)
Root Growth & Fertilizer (8 treatments) 1 Compost, 2 Manure, 3 Bone Meal, 4 Fish Emulsion, 5 Ammonium Nitrate, 6 Triple Superphosphate, 7 Potassium Sulfate, 8 Hippie Mix
- DV: root growth rate (cm)

Sampling integrity & independence are paramount; no statistical fix salvages systematic bias
Multiple comparison corrections (Tukey, Bonferroni, etc.) ethically reduce false discovery claims
Biological examples (e.g., tail manipulation) raise welfare considerations; justify with scientific value
Industry cases (surgical techniques, fertilizers) carry direct patient / consumer impact → robust inference critical

State and test assumptions explicitly before running ANOVA
Report df, test statistic, and p-value: F{df1,df_2} format
Provide means ± SD (or medians + IQR for non-parametric) & group sample sizes
Always follow significant omnibus F with an appropriate post-hoc analysis
Tie interpretations back to biological or practical relevance; statistics inform but do not replace domain reasoning