Analysis of Variance Lecture Notes Review

EFT Quiz Feedback

  • The feedback is only available when answers are incorrect due to Canvas limitations.
  • The quiz is worth 1% of the final grade.
  • If students encounter issues or want to retake the test, they can request additional attempts.
  • The questions are designed to assess understanding of simple concepts; struggling indicates a need to review notes.
  • The instructor is willing to provide full marks or allow retakes, emphasizing that it's low stakes.
  • The goal is to identify areas for review rather than stress over the grade.

Introduction to Analysis of Variance (ANOVA)

  • ANOVA is a powerful statistical analysis that can incorporate many predictors and factors.
  • The course will progress from one-way ANOVA to two-way ANOVA, eventually covering multifactorial ANOVA.
  • While large models with many parameters (like large language models) can be built, they may sacrifice understanding for predictive power.

One-Way ANOVA: Comparing Multiple Groups

  • One-way ANOVA will be approached from a t-test perspective initially.
  • Example: Comparing weight gain in chicks fed four different diets.
  • Experimental Design:
    • 20 chicks are randomly assigned to four diet groups.
    • Weight gain is measured after one week.
  • The goal is to determine which diet is most effective.

The Problem with Multiple T-Tests

  • To compare all diets, multiple t-tests would be needed, comparing all possible pairs.
  • However, performing multiple t-tests on the same dataset increases the probability of finding a significant difference by chance.
  • With six t-tests and a 95% confidence interval (5% significance level), the probability of detecting a false positive increases to 26%.
  • This is the fallacy of multiple t-tests: repeated probability calculations inflate the likelihood of error.
  • The goal is to maintain a 95% confidence interval across the entire analysis.
  • Solution: Use ANOVA, which compares all groups at once.

How ANOVA Works

  • ANOVA computes two types of differences:
    • Between-group differences (between-effects).
    • Within-group differences.
  • Between-group differences are assessed by comparing the mean values of the groups.
  • Within-group differences are assessed by computing the variance within each group.

Null and Alternative Hypotheses in ANOVA

  • ANOVA determines whether the data can be better explained by an overall mean or by separate group means.
  • Null Hypothesis: The overall mean is sufficient to explain the data.
  • Alternative Hypothesis: Group means provide a better explanation of the data.

ANOVA Terminology

  • Treatment/Factors: Categorical variables used to predict responses (e.g., diet types).
  • Levels: The different categories within a treatment (e.g., four different diets).
  • Samples: Replicates within each group (e.g., the number of chicks per diet).
  • Observations: The measured values (e.g., weight gain of individual chicks).
  • Replicates: Independent repetitions (e.g., each chick is a replicate).

Model Equation and Assumptions

  • Model equation: Observations are defined by the means and error terms of each group.
  • Model Assumptions:
    • Normality: Samples within each group are normally distributed.
    • Equal Variances: Each group has equal variance.

Assessing Normality

  • Box plots can be used to visually assess normality.
  • ANOVA is robust against violations of normality.
  • Histograms are problematic for assessing normality in ANOVA unless separate histograms are plotted per group.
  • Shapiro-Wilk tests can be used, but be cautious of their reliability, especially with complex designs.
  • Residuals should be examined for proper normality testing (discussed later).

Assessing Equal Variances

  • Calculate the ratio of the largest standard deviation to the smallest standard deviation; if less than two, the assumption is met.
  • Bartlett's test can also be used, but it is unreliable if data is not normal.
  • Residual Plots will eventually be used.

Hypothesis Testing in ANOVA

  • Null Hypothesis: All group means are equal (overall mean represents the data).
  • Alternative Hypothesis: At least two group means are different (not all are equal).
  • ANOVA can only indicate that at least two groups differ; post-hoc tests are needed to determine which groups specifically differ.

Variability in ANOVA

  • ANOVA analyzes variances to determine if differences are due to treatments or random factors.
  • Mathematically, variances are partitioned:
    • Treatment Sums of Squares (SST).
    • Residual Sums of Squares (SSE).
    • Total Sums of Squares (SSTO).
  • Equation: SSTO = SST + SSE
  • Degrees of Freedom are critical for understanding the ANOVA table.

The ANOVA Table

  • The ANOVA table summarizes the experiment, including treatment, residual, total, and associated numbers.
  • The goal is to determine if differences are due to treatment or random effects by examining the ratio of treatment effect to random effect.
  • Differences are represented as Mean Squares (MS).
  • The F-statistic (F stat) is the ratio of treatment mean squares to residual mean squares: F = \frac{MST}{MSE}
  • The F-statistic is related to the t-test statistic.

Calculations and Formulas

  • Total Sum of Squares (SSTO): Measures total variation.
    • SSTO = \sum (xi - \bar{x})^2 where xi are data points and \bar{x} is the overall mean.
  • Treatment Sum of Squares (SST): Measures variation between treatments.
  • Residual Sum of Squares (SSE): Measures random variation within treatments.

Understanding the F-Statistic (F stat)

  • The F-statistic is a ratio that explains the entire analysis.
  • It indicates how much greater the variance attributed to treatment is compared to the residual variance.
  • Example: An F-statistic of 6.65 means the treatment variation is 6.65 times higher than the residual variation.
  • The F-statistic is used to calculate a p-value, which determines if the differences are statistically significant.
  • This significant dictates a difference in at least two levels of a treatment.

Interpreting Variability

  • Sums of squares indicate variability.
  • Mean squares standardize the data to the number of samples, reflecting the accuracy of the measure.

Post-Hoc Tests

  • If ANOVA shows a significant difference, post-hoc tests are used to determine where the differences lie.
  • 95% confidence intervals are computed for each group. Overlapping confidence intervals indicate no significant difference.
  • Estimated marginal means are used to calculate confidence levels.
  • Visual techniques, such as plotting confidence intervals, can aid in comparison.
  • Example Interpretation: If the confidence interval for diet 4 does not overlap with those of diets 1, 2, and 3, diet 4 is significantly different.

Review

  • ANOVA is a method for determining if differences exist, while post-hoc testing drills into the specifics.
  • Concepts will be repeated and expanded upon in future lectures on two-way and multifactorial ANOVA.

Key Takeaways

  • ANOVA moves beyond t-tests to compare multiple groups simultaneously.
  • Focus shifts towards understanding and communicating results rather than manual calculations.