When we find a significant difference, there is an 𝛼 chance that we have made a Type I error.
The more tests we conduct, the greater the Type I error rate.
The error rate per experiment (PE) is the total number of Type 1 errors we are likely to make in conducting all the tests required in our experiment.
PE <= α \times \text{number of tests}
Bonferroni Adjusted α Level
Divide 𝛼 by the number of tests to be conducted (e.g., 0.05/2 = 0.025 if 2 tests are to be conducted).
Assess each follow-up test using this new 𝛼 level (i.e., 0.025).
Maintains PE error at 0.05, but reduces the power of your comparisons.
Other (Less Conservative Corrections)
Several statistical tests systematically compare all means while controlling for Type 1 error.
LSD - least significant difference (actually no adjustment).
Tukey’s HSD - Honestly Significant Difference, popular as best balance between control of EW error rate and power (i.e., Type 1 vs. Type 2 error).
Newman-Keuls: gives more power but less stringent control of EW error rate.
Scheffe Test: most stringent control of EW error rate as controls for all possible simple and complex contrasts.
Tukey’s test is very common and recommended.
SPSS & Post Hoc Tests
Select ONE-WAY ANOVA from the COMPARE MEANS option in the ANALYZE MENU.
Specify the IV and DV in the usual way.
Select the POST HOC button.
Select the desired post hoc test by clicking on it.
Press the CONTINUE button when all required tests are selected.
Press the OK button to run the analysis.
Interpreting SPSS Output
The asterisks next to mean differences are significant at 𝛼=0.05.
Type 1 error rates are already accounted for here. Can assess significance at 0.05.
Summary: Choosing the Right Approach
If your hypothesis predicts specific differences between means:
Assess assumptions
Perform ANOVA
Consider what comparisons will test your specific hypotheses
Perform planned comparisons needed to test these predictions
If your hypothesis does not predict specific differences between means:
Assess assumptions
Perform ANOVA
If ANOVA is significant, then perform post-hoc tests
If ANOVA is not significant, then don’t do post-hoc tests
Effect Size
A significant F simply tells us that there is a difference between means (i.e., that the IV has had some effect on the DV).
It does not tell us how big this difference is or how important this effect is.
An F significant at 0.01 does not necessarily imply a bigger or more important effect than an F significant at 0.05.
Significance of F is dependent on the sample size and the number of conditions which determines the F comparison distribution.
We need a statistic which summarizes the strength of the treatment effect:
Eta squared (η2)
Indicates the proportion of the total variability in the data accounted for by the effect of the IV.
Eta Squared (η2) for ANOVA
This result says that 65% of the variability in errors is due to the effect of manipulating sleep deprivation.
While it is the measure of effect size given by SPSS, there are some limitations with η2 for use in ANOVA to be aware of;
It is a descriptive statistic not an inferential statistic so not the best indicator of the effect size in population
It tends to be an overestimate of the effect size in the population.
η2=SS</em>TotalSS<em>between=5119.753314.25=.65
Criteria for Assessing η2
η2 may range from 0 to 1
Cohen (1977) proposed the following scale for effect size:
0.01 = small effect
0.06 = medium effect
>0.14 = large effect
Interpreting Effect Size
The effect sizes typically observed in psychology may vary from area to area.
The levels of the IV used are important in determining the observed effect size.
A theoretically important IV may still only account for a small proportion of the variability in the data.
A theoretically unimportant IV may account for a large proportion of variability in the data.
Reporting Effect Size
As η2 is the effect size given in SPSS, it is the most commonly reported measure.
But as noted it is only a descriptive statistic and tends to overestimate the effect size.
You can report Eta squared with your ANOVA e.g., F(3,12)=7.34,p=.005,η2=.65.
For the full picture (effect size, sample size, error, and alpha) we should also report the MSerror somewhere in the results. In APA this is called MSE. F(3,12)=7.34,p=.005,η2=.65,MSE=150.46.
Examples of Reporting Results
A one-way independent-samples ANOVA was conducted on the target identification accuracy (number of correct responses) as a function of the amount of sleep deprivation the participant experienced with 4 levels (8, 12, 20, and 28 hours). The analysis showed a significant effect of sleep deprivation, F(3,12)=7.34,p=.005,η2=.65,MSE=150.46. This indicates that approximately 65% of the variation in accuracy scores can be attributed to changes in sleep deprivation; this is a large effect size.
Examples of Reporting Planned Contrasts
To further explore mean differences, a series of a priori contrasts were formed. Contrast 1 found that accuracy scores in the 12hr condition were not significantly different than the 8hr condition t(12)=1.30,p=.219. Contrast 2 however, found that accuracy scores in the 28 and 20hr conditions were significantly greater than the 12 and 8hr conditions t(12) = 4.48, p <.001.
The number of planned contrasts should match the number of a-priori planned comparisons hypothesised.
Examples of Reporting Post Hoc Tests
To further explore mean differences, a series of Tukey adjusted post hoc pairwise comparisons were performed. As shown in Table 1, post hoc comparisons revealed significant mean differences between the 28hr and 8hr condition (p<.05), as well as 28hr and 12hr conditions (p<.05), however all other pairwise comparisons were non-significant (p>.05).
You would only do one or the other – either planned, or post-hoc tests. Not both.