Planned Comparisons and Post Hoc Tests & Effect Size
Planned Comparisons and Post Hoc Tests
Learning Goals
- Describe major approaches to following up mean differences in an ANOVA.
- Set up and analyze planned comparisons between group means.
- Set up and analyze post-hoc comparisons between group means.
- Report ANOVA comparisons and effect size.
Revisiting Sleep Deprivation Example
- Target identification accuracy as a function of sleep deprivation:
- Table includes sleep deprivation in hours (28, 20, 12, 8), with multiple data points for each.
- Shows sum (Σ), mean (M), and standard deviation (S) for each sleep deprivation level.
- ANOVA Summary:
- Sleep deprivation had a significant effect on the number of errors made, F(3,12) = 7.34, p = 0.005.
- This indicates a significant effect, but doesn't specify which groups differ.
The Need for Further Analysis
- The F ratio only indicates that there is a difference somewhere between the means.
- Further analysis is needed to determine where the specific differences lie.
Approaches to Comparisons
A Priori (Planned) Comparisons
- Used when there is a strong theoretical interest in certain groups and specific hypotheses based on evidence.
- Focuses on comparing only groups of interest.
- Overall ANOVA is done first, progressing to planned comparisons.
Post Hoc Comparisons
- Used when you cannot predict exactly which means will differ.
- Overall ANOVA is done first to check if the independent variable (IV) has an effect.
- Compares all groups to each other to explore differences.
- More exploratory and less refined than planned comparisons.
A Priori/Planned Comparisons
- Can be simple or complex:
- Simple: Comparing one group to one other group.
- Complex: Comparing a set of groups to another set of groups.
- In SPSS, complex comparisons are created by assigning weights to different groups.
Complex Comparison Example
- Comparing groups with 8 or 12 hours of sleep deprivation against the group with 20 hours.
- Create two sets of weights:
- One for the first set of means.
- One for the second set of means.
- Assign a weight of zero to any remaining groups.
Weight Assignment
- Set 1 (e.g., 8h, 12h) gets positive weights.
- Set 2 (e.g., 20h) gets negative weights.
- The weights must sum to 0.
More Complex Example with 5 Groups
- Comparing the first 3 groups with the last 2 groups.
- A simple rule that always works: The weight for each group is equal to the number of groups in the other set.
- If comparing Group 1, 2, 3 vs. Group 4, 5:
- Groups 1, 2, 3 get a weight of 2 (number of groups in Set 2).
- Groups 4, 5 get a weight of -3 (negative of the number of groups in Set 1).
- If comparing Group 1, 2, 3 vs. Group 4, 5:
- Total weights must sum to zero (e.g., 2 + 2 + 2 - 3 - 3 = 0).
Planned Comparisons Calculation
- Apply the weight to each mean.
- Calculate a sum of squares for the contrast.
- MS{contrast} = SS{contrast} (because df = 1).
- Perform F-test using the MS{error} from the omnibus ANOVA. L = ∑aj \bar{X}j SS{contrast} = \frac{nL^2}{σ aj^2} F = \frac{MS{contrast}}{MS_{error}}
T-test or F-test for Planned Comparisons
- Because you are always comparing two sets of means, there is only 1 df for Treatment.
- Can be tested using an F-test or a t-test.
- SPSS reports this as a t-test.
- With 1 df for Treatment, F = t^2 or t = \sqrt{F}.
Assumptions of Planned Comparisons
- Subject to the same assumptions as the overall ANOVA, particularly homogeneity of variance.
- SPSS provides output for homogeneity assumed and homogeneity not assumed.
- If homogeneity is not assumed, SPSS adjusts the df of the F critical to control for any inflation of Type 1 error.
Orthogonal Contrasts
- Each contrast tests something completely different from the other contrasts.
- Principle: Once you have compared one group (e.g., A) with another (e.g., B), you don’t compare them again.
- Example: Groups 1, 2, 3, 4
- Contrast 1 = 1, 2 vs 3, 4
- Contrast 2 = 1 vs 2
- Contrast 3 = 3 vs 4
Checking for Orthogonality
- Can be checked by drawing a tree diagram.
Cool Things About Orthogonal Contrasts
- A set of k-1 orthogonal contrasts (where k is the number of groups) accounts for all of the differences between groups.
- A set of k-1 planned contrasts can be performed without adjusting for type-I error rate (according to some authors).
Post Hoc Comparisons
- Used when there is a belief that the IV will impact performance but no specific hypothesis about which conditions will differ.
- Planned comparisons would not be appropriate in this case.
- Perform the overall F analysis first.
- If overall F is non-significant, stop.
- If overall F is significant, perform post-hoc tests to determine where the differences are.
Post Hoc Tests
- Seek to compare all possible combinations of means, leading to many pair-wise comparisons.
- e.g., With 4 groups, 6 comparisons: 1v2, 1v3, 1v4, 2v3, 2v4, 3v4.
Type I Error Rates
- When we find a significant difference, there is an 𝛼 chance that we have made a Type I error.
- The more tests we conduct, the greater the Type I error rate.
- The error rate per experiment (PE) is the total number of Type 1 errors we are likely to make in conducting all the tests required in our experiment.
- PE <= α \times \text{number of tests}
Bonferroni Adjusted α Level
- Divide 𝛼 by the number of tests to be conducted (e.g., 0.05/2 = 0.025 if 2 tests are to be conducted).
- Assess each follow-up test using this new 𝛼 level (i.e., 0.025).
- Maintains PE error at 0.05, but reduces the power of your comparisons.
Other (Less Conservative Corrections)
- Several statistical tests systematically compare all means while controlling for Type 1 error.
- LSD - least significant difference (actually no adjustment).
- Tukey’s HSD - Honestly Significant Difference, popular as best balance between control of EW error rate and power (i.e., Type 1 vs. Type 2 error).
- Newman-Keuls: gives more power but less stringent control of EW error rate.
- Scheffe Test: most stringent control of EW error rate as controls for all possible simple and complex contrasts.
- Tukey’s test is very common and recommended.
SPSS & Post Hoc Tests
- Select ONE-WAY ANOVA from the COMPARE MEANS option in the ANALYZE MENU.
- Specify the IV and DV in the usual way.
- Select the POST HOC button.
- Select the desired post hoc test by clicking on it.
- Press the CONTINUE button when all required tests are selected.
- Press the OK button to run the analysis.
Interpreting SPSS Output
- The asterisks next to mean differences are significant at 𝛼=0.05.
- Type 1 error rates are already accounted for here. Can assess significance at 0.05.
Summary: Choosing the Right Approach
- If your hypothesis predicts specific differences between means:
- Assess assumptions
- Perform ANOVA
- Consider what comparisons will test your specific hypotheses
- Perform planned comparisons needed to test these predictions
- If your hypothesis does not predict specific differences between means:
- Assess assumptions
- Perform ANOVA
- If ANOVA is significant, then perform post-hoc tests
- If ANOVA is not significant, then don’t do post-hoc tests
Effect Size
- A significant F simply tells us that there is a difference between means (i.e., that the IV has had some effect on the DV).
- It does not tell us how big this difference is or how important this effect is.
- An F significant at 0.01 does not necessarily imply a bigger or more important effect than an F significant at 0.05.
- Significance of F is dependent on the sample size and the number of conditions which determines the F comparison distribution.
- We need a statistic which summarizes the strength of the treatment effect:
- Eta squared (\eta^2)
- Indicates the proportion of the total variability in the data accounted for by the effect of the IV.
Eta Squared (\eta^2) for ANOVA
- This result says that 65% of the variability in errors is due to the effect of manipulating sleep deprivation.
- While it is the measure of effect size given by SPSS, there are some limitations with \eta^2 for use in ANOVA to be aware of;
- It is a descriptive statistic not an inferential statistic so not the best indicator of the effect size in population
- It tends to be an overestimate of the effect size in the population.
\eta^2 = \frac{SS{between}}{SS{Total}} = \frac{3314.25}{5119.75} = .65
Criteria for Assessing \eta^2
- \eta^2 may range from 0 to 1
- Cohen (1977) proposed the following scale for effect size:
- 0.01 = small effect
- 0.06 = medium effect
- >0.14 = large effect
Interpreting Effect Size
- The effect sizes typically observed in psychology may vary from area to area.
- The levels of the IV used are important in determining the observed effect size.
- A theoretically important IV may still only account for a small proportion of the variability in the data.
- A theoretically unimportant IV may account for a large proportion of variability in the data.
Reporting Effect Size
- As \eta^2 is the effect size given in SPSS, it is the most commonly reported measure.
- But as noted it is only a descriptive statistic and tends to overestimate the effect size.
- You can report Eta squared with your ANOVA e.g., F(3, 12) = 7.34, p = .005, \eta^2 = .65.
- For the full picture (effect size, sample size, error, and alpha) we should also report the MSerror somewhere in the results. In APA this is called MSE. F(3, 12) = 7.34, p = .005, \eta^2 = .65, MSE = 150.46.
Examples of Reporting Results
- A one-way independent-samples ANOVA was conducted on the target identification accuracy (number of correct responses) as a function of the amount of sleep deprivation the participant experienced with 4 levels (8, 12, 20, and 28 hours). The analysis showed a significant effect of sleep deprivation, F(3, 12) = 7.34, p =.005, \eta^2 = .65, MSE = 150.46. This indicates that approximately 65% of the variation in accuracy scores can be attributed to changes in sleep deprivation; this is a large effect size.
Examples of Reporting Planned Contrasts
- To further explore mean differences, a series of a priori contrasts were formed. Contrast 1 found that accuracy scores in the 12hr condition were not significantly different than the 8hr condition t(12) = 1.30, p = .219. Contrast 2 however, found that accuracy scores in the 28 and 20hr conditions were significantly greater than the 12 and 8hr conditions t(12) = 4.48, p <.001.
- The number of planned contrasts should match the number of a-priori planned comparisons hypothesised.
Examples of Reporting Post Hoc Tests
- To further explore mean differences, a series of Tukey adjusted post hoc pairwise comparisons were performed. As shown in Table 1, post hoc comparisons revealed significant mean differences between the 28hr and 8hr condition (p
- You would only do one or the other – either planned, or post-hoc tests. Not both.