How to Understand and Use ANOVA
What We're Learning About Today
Review One-Way ANOVA: Go over the basics of how ANOVA works.
Why and When to Use Post Hoc Tests: Learn about extra tests needed after ANOVA.
Understanding ANOVA Effect Sizes (η²): See how important the results are in real life.
How to Read a Full ANOVA Report: Learn to understand all the numbers from the test.
ANOVA for More Complex Studies: See how ANOVA can be used in different situations (like repeated measurements or multiple factors).
Breaking Down Differences (Variance Partitioning)
Total Variance: All the differences among all the scores.
Between-Groups Variance: Differences caused by the different groups or treatments being studied.
Within-Groups Variance: Differences among people within the same group; this is usually random or unexplained variation.
How ANOVA Works: It compares these different kinds of differences using "sums of squares" to see if the group differences are bigger than random differences.
The F-Ratio and What It Means
How it's Calculated: F = \frac{\text{Differences BETWEEN groups}}{\text{Differences WITHIN groups}}
What the F-ratio Tells You:
Big F-value: Suggests that the treatment or group differences likely caused the changes you see.
Small F-value: Means that most of the differences are just random, not due to your treatment.
Significance (p-value):
If the p-value is less than .05 (p < .05), it means the results are statistically significant. There's a low chance these differences happened by accident.
If p > .05, the results are not significant, meaning the groups are not different enough to rule out chance.
What ANOVA Doesn't Tell You on its Own
A significant F-test only tells you that at least one group is different from another. It doesn't tell you which specific groups are different.
Post Hoc Tests: These are additional tests you run after a significant ANOVA to find out exactly which groups are different from each other.
What Post Hoc Tests Do
They compare pairs of groups (e.g., Group A vs. Group B, Group A vs. Group C, etc.) after you've found a significant F-value.
They adjust the way they test to avoid making too many false alarms (Type I errors) because you're doing many comparisons.
Common Types:
Tukey's Test
Bonferroni Correction
Scheffé's Method
Controlling for Errors
Familywise α: This is the overall chance of making at least one Type I error (a false alarm) when you do many tests. Post hoc tests are stricter to keep this chance low.
Example: Tukey's HSD test helps keep the total error rate across all comparisons at .05.
Common Post Hoc Tests Simplified
Tukey's HSD: Good to use when your groups are roughly the same size. It's good at controlling errors.
Bonferroni: Simple and cautious. Useful when your groups are of different sizes or when you have specific comparisons you planned beforehand.
Scheffé: More flexible for complex or unplanned comparisons that you didn't think of initially.
When to Use Each Post Hoc Test
Tukey's HSD: Use it often for comparing all possible pairs of group means, especially with equal group sizes.
Bonferroni: Use for a small number of planned comparisons.
Scheffé: Use when you need to explore many different, complex, or unplanned comparisons.
How to Read SPSS Output (Simplified)
Descriptives Output: Shows basic information for each group:
N (Number of people in the group)
Mean (Average score)
Std. Deviation (How spread out the scores are)
Confidence Interval (A range where the true average likely falls).
Multiple Comparisons (Tukey HSD) Output: This table shows specific comparisons between pairs of groups.
Mean Difference: How much the averages of two groups differ.
p-value: Tells you if that specific pair difference is significant (p < .05).
Reading Results: Look for the p-values. If p < .05 for a specific pair (e.g., Cohort A vs. Cohort B with p = .003), those two groups are significantly different.
Why Effect Size is Important
Significance vs. Real-World Importance: A statistically significant result (p < .05) means it's unlikely due to chance, but it doesn't mean the difference is big or important in the real world.
Effect Size: Tells you how big the difference is between groups or how much of the total variation in scores is explained by your independent variable (the thing you changed between groups).
η² (Eta-squared) vs. Cohen’s d
η² (Eta-squared): Used in ANOVA. It's a percentage that tells you how much of the total differences in scores is explained by the groups you're comparing.
Cohen's d: Used in t-tests. It tells you the difference between two group means in terms of standard deviation units.
Both tell you about the strength of the relationship or the size of the effect.
How to Calculate and Interpret Effect Size (η²)
Formula: η² = \frac{\text{Differences BETWEEN groups (Sum of Squares)}}{\text{Total Differences (Sum of Squares)}}
Where SS{\text{Total}} = SS{\text{Between}} + SS_{\text{Within}}
What the Numbers Mean:
0.01 = Small effect (1% of variance explained)
0.06 = Medium effect (6% of variance explained)
0.14 = Large effect (14% of variance explained)
Example: If η² = 0.25, it means about 25% of the differences in scores can be explained by the differences between your groups (due to the treatment).
Example Calculation
If SS{\text{Between}} = 240 and SS{\text{Within}} = 720, then:
SS_{\text{Total}} = 240 + 720 = 960
η² = \frac{240}{960} = 0.25
Interpretation: This means 25% of the total variability in the data is due to the differences across the groups.
The Full Story from ANOVA (3 Key Questions)
Is the F statistic significant? (Are there any differences between groups, beyond chance?)
How big is the effect size (η²)? (How important are these differences in the real world?)
Which specific groups differ? (Which exact pairs of groups are significantly different, identified by post hoc tests?)
Interpreting F and p Values
Example: F(2, 27) = 4.63, p = .019
Since p = .019 is less than .05, we conclude that there is a significant difference among the group means. We reject the idea that all group means are the same.
Interpreting η² and Post Hoc Results (Together)
If η² = .25, this suggests a large, practically important effect.
Post hoc tests might then show, for example, that Group A performed much better than Group C (p = .01). This gives a complete picture of both significance and the size/location of the effect.
SPSS Output for "Satisfaction" Example
ANOVA Summary
This table shows the 'Sum of Squares' for different sources of variance (Between Groups, Within Groups, Total).
It gives an F-value (e.g., 4.361) and a 'Sig.' (p-value, e.g., .002).
Since Sig. = .002 is less than .05, there's a significant difference in satisfaction between groups.
ANOVA Effect Sizes (More Detail)
This section provides different measures of effect size, like Eta-squared (η² = 0.110).
The 95% Confidence Interval (e.g., (0.017, 0.192)) gives a range where the true effect size likely falls.
Multiple Comparisons (Tukey HSD) for Satisfaction
This table compares specific industry pairs (e.g., Academic vs. Community).
It shows the Mean Difference and how significant it is.
If the output says "Not significant," it means there's no statistically reliable difference between those two specific groups in terms of satisfaction.
Repeated Measures Design
What it is: You measure the same people multiple times under different conditions or over time (e.g., before treatment, after treatment, at follow-up).
Benefit: It helps control for individual differences because each person acts as their own comparison, making the results stronger.
Connection: It's like a paired t-test but for more than two measurement points.
Factorial ANOVA
What it is: Allows you to study more than one independent variable (factor) at the same time (e.g., Gender and Training together).
What it can show: It can look at:
Main effects: The effect of each factor by itself (e.g., the effect of Training regardless of Gender).
Interaction effects: When the effect of one factor depends on the level of another factor (e.g., Training might affect men and women differently).
2-Way ANOVA Main Effects Explained (Example)
Imagine you have Men and Women, some get Training, some get No Training.
Example Data: Men with Training average 82, Men with No Training average 75. Women with Training average 90, Women with No Training average 85.
Main effect of Gender: Is there a general difference between Men and Women?
Main effect of Training: Is there a general effect of Training?
Interaction effect: Does the effect of Training change depending on whether someone is Male or Female?
SPSS Output for 2-Way ANOVA
Tests of Between-Subjects Effects: This is the main table that shows:
F-values and p-values for each main effect (e.g., Gender, Training).
F-value and p-value for the interaction effect (e.g., Gender * Training).
If a p-value is less than .05, that effect (main or interaction) is significant.
Estimated Marginal Means Output
This usually includes graphs that visually show the average scores for different groups and conditions.
Error bars: These lines on the graph show the 95% Confidence Intervals, helping you see how precise the averages are and if groups truly differ.