In-depth Notes on Analysis of Variance (ANOVA)

What is Analysis of Variance (ANOVA)?

  • ANOVA is a statistical method used to test if there are any statistically significant differences between the means of three or more independent groups.
  • Depending on the hypothesis being tested, the null hypothesis can be represented as:
    • H<em>0:µ</em>1=µ<em>2==µ</em>kH<em>0 : µ</em>1 = µ<em>2 = … = µ</em>k (means of populations are equal)
    • Contrary hypothesis is H<em>a:µ</em>iµjH<em>a : µ</em>i \neq µ_j for at least one pair i ≠ j.

When is ANOVA Used?

  • Experiment Scenario: Participants are divided based on breakfast consumed and tested on aptitude.
    • Groups:
    • No breakfast
    • Partial breakfast
    • Full breakfast
  • Factors and Levels:
    • Experimental units: Participants
    • Factor: Breakfast (with 3 levels)
    • Response variable: Test scores

Extending Factors: Two-Way ANOVA

  • If participants are also grouped by age in the breakfast example, this introduces a second factor:
    • Age factor (young and old)
  • Specific combinations of factors are referred to as treatments (e.g., young with full breakfast)
  • One-way ANOVA focuses on a single factor, while two-way ANOVA includes multiple factors.

How to Test Multiple Means?

  • Testing multiple means using separate two-sample tests leads to potential issues of Type I error.
  • Example with 50 populations leads to 1225 tests, increasing the likelihood of incorrectly rejecting the null hypothesis due to chance.

The F Statistic

  • To evaluate means across multiple populations, the F statistic is computed as:
    F=variation between sample meansvariation within the samplesF = \frac{\text{variation between sample means}}{\text{variation within the samples}}
  • This ratio tests the null hypothesis (means are equal) against the alternative that at least one mean is different.

Computing the F Statistic

  1. Calculate sample means and variances:
    • For each group, compute mean xˉ<em>i\bar{x}<em>i and variance s2</em>is^2</em>i.
  2. Overall mean: xˉ=1N<em>i=1kn</em>ixˉi\bar{x} = \frac{1}{N} \sum<em>{i=1}^k n</em>i \bar{x}_i where N is total sample size.
  3. Sum of Squares Between Treatments (SST):
    SST=<em>i=1kn</em>i(xˉixˉ)2SST = \sum<em>{i=1}^k n</em>i(\bar{x}_i - \bar{x})^2
  4. Sum of Squares Within Treatments (SSE):
    SSE=<em>i=1k(n</em>i1)si2SSE = \sum<em>{i=1}^k (n</em>i - 1)s^2_i
  5. Total Sum of Squares (Total SS):
    Total  SS=SST+SSETotal~~SS = SST + SSE
  6. ANOVA Table: Fill in degrees of freedom (df), sums of squares (SS), mean squares (MS), and F statistic.

Example 1 (Two Scenarios)

  • Scenario A: Sampling from different normal populations shows more clear mean differences than Scenario B.
  • ANOVA Results:
    * Scenario A: SST = 720.82, Error = 43.75, F = 345.97
    * Scenario B: SST = 849.18, Error = 1190.92, F = 14.97

Assumptions in ANOVA

  • Random sampling from normal populations with equal variances (σ2σ^2).
  • ANOVA is robust with regards to the normality assumption as long as sample sizes are similar and distributions are approximately normal.

Example 2: Completing ANOVA Table

  1. Fill in degrees of freedom for error using Mean Square Error (MSE).
  2. Calculate sums of squares between treatments and error based on total sum of squares.
  3. Obtain F statistic.

Example 3: One-Way ANOVA Hypothesis Test

  • Hypothesis: Test differences in calcium intake among groups (Normal, Osteopenia, Osteoporosis).
  • Compute sample means, variances, overall mean, SST and SSE to complete ANOVA table.

Estimating Differences in Treatments

  • After identifying different means, use confidence intervals based on the Student’s t-distribution to analyze which means differ significantly.

Two-Way Classification (Blocking)

  • Randomized block designs partition treatments into blocks that share similar characteristics, reducing variability in the response.
  • ANOVA Table for Block Design:
    • Functional structure includes treatments and blocks with separate sum of squares calculations for treatments and blocks.

Example 4: Consumer Study for Cell Phone Costs

  • Hypothesis testing shows differences are primarily explained by usage levels rather than service providers. F statistics indicate significance of block factor (usage) overshadowed treatment differences (companies).

Cautionary Comments

  • Ensure treatment and block design are suitable; block design only applies when treatment and block factors do not affect each other.

Closing Remarks

  • Even with complex designs like ANOVA, software can streamline calculations. Hypothesis testing principles remain similar, but ensure to understand all assumptions for validity.