ANOVA & Multiple-Group Comparisons – Detailed Study Sheet

ANOVA: Core Purpose and Context

  • Designed for >2 categorical groups when the dependent variable is continuous
    • Extends the two–sample t–test logic to multiple groups without inflating type I error
  • Full name: “Analysis of Variance (ANOVA)”
  • Central idea: partition the total variability in the data set into
    • Treatment (among–group) variability
    • Background/within–group (error) variability
  • Key advantage over running many t–tests
    • Multiple t–tests raise the family-wise type I error (e.g., 6 tests at 5\% each ≈ 30\% chance of at least one false positive)
    • ANOVA + post-hoc tests control this inflation

Fundamental Hypotheses

  • Null: \text{H}0: \mu1 = \mu2 = \mu3 = \dots = \mu_n (all group means equal)
  • Alternative: \text{H}_A: at least one mean differs
  • Important: ANOVA tells if a difference exists, not where; post-hoc tests are required for pairwise insight

Core Test Statistic

  • Ratio of mean squares
    F = \dfrac{\text{MS}{\text{between}}}{\text{MS}{\text{within}}}
  • Decision rule: compare calculated F to critical F{\alpha,df1,df_2} or use p-value

Assumptions Checklist (Classical One-Way ANOVA)

  • Independence: samples are random & independent
  • Homogeneity of variance (equal variances among groups)
    • Tested with Levene’s, Bartlett’s, or in R: leveneTest()
  • Normality of residuals
    • Residuals = observed value – group mean
    • Combine residuals from all groups → single vector centred at 0
    • Shapiro–Wilk, Kolmogorov–Smirnov, QQ-plot, histogram Density curve

Visual & Code Aids (R snippets)

  • Boxplot for variance impression: boxplot(Calcium ~ Diagnosis, las = 1, col = c(2,4,8))
  • Levene test: leveneTest(Calcium ~ Diagnosis) → F_{2,15}=1.63,\;p=0.229 (fail to reject equal variances)
  • Fit model & extract residuals:
  model <- aov(Calcium ~ Diagnosis)
  modelresid <- residuals(model)
  shapiro.test(modelresid)   # W = 0.971, p = 0.815
  • Histogram + density: hist(...); lines(density(...))

Why Not Just Learn The "Math By Hand"?

  • Instructor note: mechanics still matter but conceptual grasp & software execution are now priority

Example 1: Calcium Intake vs Bone Density Category

  • Groups: Normal, Osteopenia, Osteoporosis
  • Sample sizes not explicit, but total df_{\text{within}} = 15 (⇒ 18 patients)
  • Assumptions met (random, equal variance, normal residuals)
  • ANOVA result: F{2,15}=1.4,\;p>0.27 → Fail to reject \text{H}0
    • Means ± SD: Normal 0.94\pm0.16\,g, Osteopenia 0.80\pm0.22\,g, Osteoporosis 0.72\pm0.30\,g
    • Conclusion: No significant difference in calcium intake across bone-density groups

Example 2: Long-Tail Widowbirds (Andersson 1982)

  • Biological premise: sexual selection may favor males with longer tails → more nests (mates)
  • Design (36 males):
    • Shortened tails (−14 cm)
    • Control / Average tails (cut–glued same length)
    • Elongated tails (~25 cm; +14 cm glued)
  • Assumption tests
    • Normality of residuals: W=0.94,\;p>0.07 (fail to reject normality)
    • Homogeneity: F_{2,33}=0.38,\;p>0.68 (equal variances)
  • ANOVA: F{2,33}=88.8,\;p0 (at least one mean differs)
  • Post-hoc: Tukey’s HSD (controls family-wise error)
    • Shortened vs Control: p>0.09 (not sig.)
    • Shortened vs Elongated: p<0.00001 (sig.)
    • Control vs Elongated: p<0.00001 (sig.)
  • Interpretation: Elongated males ( 3.38\pm0.53 nests ) sire significantly more nests than other groups; supports sexual-selection hypothesis

Post-Hoc Testing Logic & Options

  • Tukey’s Honestly Significant Difference (HSD)
    • Uses a critical difference value; pairwise means differing by >HSD are “honestly” significant
    • Essentially a t-test adjusted for the family-wise error rate
  • When assumptions fail:
    • Welch’s ANOVA (variance heterogeneity)
    • Post-hoc: Games–Howell
    • Kruskal–Wallis (non-normal residuals / ordinal data)
    • Tests medians, distribution-free
    • Post-hoc: Dunn’s test + Bonferroni correction

Formulae & Error-rate Concepts

  • Family-wise error rate (FWER): probability of ≥1 type I error across a family of comparisons
    • Example: six t–tests at \alpha=0.05 each → 1-(1-0.05)^6 ≈ 0.265 ≈ 26.5\% (rounded to 30\% in slide)
  • Levene’s statistic (median centred version): F = \dfrac{(N-k)}{(k-1)}\;\dfrac{\sumk nk (|Y{k\,i}-\tilde{Y}k| - |Y{..}-\tilde{Y}{..}|)^2}{\sum{k,i} (|Y{k\,i}-\tilde{Y}k| - |Y{..}-\tilde{Y}_{..}|)^2}
  • Shapiro–Wilk statistic W compares ordered residuals to expected normal order statistics

Additional Applied Scenarios (Slides 15-18)

  • Surgical Closure Recovery Time (150 appendectomy patients)
    • Treatments: Staples, Sutures, Glue, Strips, Intradermal sutures
    • Objective: detect \Delta in days to recovery across 5 methods (would later require 10 pairwise tests if naïve)
  • Evolution of Severe Skin Lesions (Staphylococcus aureus strains)
    • 5 annual strains; dependent: % lesion coverage. Identify first mutation year causing increased severity.
  • Study Technique Effectiveness
    • Modes: Reading, Writing, Listening, Watching. DV: test grade.
    • Real-world note: DISC learning-style claims largely unsupported; test with ANOVA.
  • Zone of Inhibition (Old-school Microbiology)
    • Antibiotics: Penicillin, Tetracycline, Ciprofloxacin vs K. pneumoniae. Wider zone ⇒ more susceptible.
  • Elephant Stress (African reserves)
    • fGCM levels across Kruger (heavy tourism), Serengeti (moderate), Etosha (low)
  • Root Growth & Fertilizer (8 treatments) 1 Compost, 2 Manure, 3 Bone Meal, 4 Fish Emulsion, 5 Ammonium Nitrate, 6 Triple Superphosphate, 7 Potassium Sulfate, 8 Hippie Mix
    • DV: root growth rate (cm)

Decision Tree for 1–Way Comparison of >2 Groups

  • Assumptions satisfied → Classical ANOVA → p<0.05? Tukey HSD
  • Variances unequal (normal residuals) → Welch’s ANOVA → p<0.05? Games–Howell
  • Residuals non-normal → Kruskal–Wallis → p<0.05? Dunn + Bonferroni

Conceptual & Ethical Notes

  • Sampling integrity & independence are paramount; no statistical fix salvages systematic bias
  • Multiple comparison corrections (Tukey, Bonferroni, etc.) ethically reduce false discovery claims
  • Biological examples (e.g., tail manipulation) raise welfare considerations; justify with scientific value
  • Industry cases (surgical techniques, fertilizers) carry direct patient / consumer impact → robust inference critical

Practical Reminders for Exam & Real-World Use

  • State and test assumptions explicitly before running ANOVA
  • Report df, test statistic, and p-value: F{df1,df_2} format
  • Provide means ± SD (or medians + IQR for non-parametric) & group sample sizes
  • Always follow significant omnibus F with an appropriate post-hoc analysis
  • Tie interpretations back to biological or practical relevance; statistics inform but do not replace domain reasoning