In-depth Notes on Analysis of Variance (ANOVA)
What is Analysis of Variance (ANOVA)?
- ANOVA is a statistical method used to test if there are any statistically significant differences between the means of three or more independent groups.
- Depending on the hypothesis being tested, the null hypothesis can be represented as:
- H<em>0:µ</em>1=µ<em>2=…=µ</em>k (means of populations are equal)
- Contrary hypothesis is H<em>a:µ</em>i=µj for at least one pair i ≠ j.
When is ANOVA Used?
- Experiment Scenario: Participants are divided based on breakfast consumed and tested on aptitude.
- Groups:
- No breakfast
- Partial breakfast
- Full breakfast
- Factors and Levels:
- Experimental units: Participants
- Factor: Breakfast (with 3 levels)
- Response variable: Test scores
Extending Factors: Two-Way ANOVA
- If participants are also grouped by age in the breakfast example, this introduces a second factor:
- Age factor (young and old)
- Specific combinations of factors are referred to as treatments (e.g., young with full breakfast)
- One-way ANOVA focuses on a single factor, while two-way ANOVA includes multiple factors.
How to Test Multiple Means?
- Testing multiple means using separate two-sample tests leads to potential issues of Type I error.
- Example with 50 populations leads to 1225 tests, increasing the likelihood of incorrectly rejecting the null hypothesis due to chance.
The F Statistic
- To evaluate means across multiple populations, the F statistic is computed as:
F=variation within the samplesvariation between sample means - This ratio tests the null hypothesis (means are equal) against the alternative that at least one mean is different.
Computing the F Statistic
- Calculate sample means and variances:
- For each group, compute mean xˉ<em>i and variance s2</em>i.
- Overall mean: xˉ=N1∑<em>i=1kn</em>ixˉi where N is total sample size.
- Sum of Squares Between Treatments (SST):
SST=∑<em>i=1kn</em>i(xˉi−xˉ)2 - Sum of Squares Within Treatments (SSE):
SSE=∑<em>i=1k(n</em>i−1)si2 - Total Sum of Squares (Total SS):
Total SS=SST+SSE - ANOVA Table: Fill in degrees of freedom (df), sums of squares (SS), mean squares (MS), and F statistic.
Example 1 (Two Scenarios)
- Scenario A: Sampling from different normal populations shows more clear mean differences than Scenario B.
- ANOVA Results:
* Scenario A: SST = 720.82, Error = 43.75, F = 345.97
* Scenario B: SST = 849.18, Error = 1190.92, F = 14.97
Assumptions in ANOVA
- Random sampling from normal populations with equal variances (σ2).
- ANOVA is robust with regards to the normality assumption as long as sample sizes are similar and distributions are approximately normal.
Example 2: Completing ANOVA Table
- Fill in degrees of freedom for error using Mean Square Error (MSE).
- Calculate sums of squares between treatments and error based on total sum of squares.
- Obtain F statistic.
Example 3: One-Way ANOVA Hypothesis Test
- Hypothesis: Test differences in calcium intake among groups (Normal, Osteopenia, Osteoporosis).
- Compute sample means, variances, overall mean, SST and SSE to complete ANOVA table.
Estimating Differences in Treatments
- After identifying different means, use confidence intervals based on the Student’s t-distribution to analyze which means differ significantly.
Two-Way Classification (Blocking)
- Randomized block designs partition treatments into blocks that share similar characteristics, reducing variability in the response.
- ANOVA Table for Block Design:
- Functional structure includes treatments and blocks with separate sum of squares calculations for treatments and blocks.
Example 4: Consumer Study for Cell Phone Costs
- Hypothesis testing shows differences are primarily explained by usage levels rather than service providers. F statistics indicate significance of block factor (usage) overshadowed treatment differences (companies).
- Ensure treatment and block design are suitable; block design only applies when treatment and block factors do not affect each other.
- Even with complex designs like ANOVA, software can streamline calculations. Hypothesis testing principles remain similar, but ensure to understand all assumptions for validity.