Planned Comparisons and Post Hoc Tests & Effect Size

Planned Comparisons and Post Hoc Tests

Learning Goals

Describe major approaches to following up mean differences in an ANOVA.
Set up and analyze planned comparisons between group means.
Set up and analyze post-hoc comparisons between group means.
Report ANOVA comparisons and effect size.

Revisiting Sleep Deprivation Example

Target identification accuracy as a function of sleep deprivation:
- Table includes sleep deprivation in hours (28, 20, 12, 8), with multiple data points for each.
- Shows sum (Σ), mean (M), and standard deviation (S) for each sleep deprivation level.
ANOVA Summary:
- Sleep deprivation had a significant effect on the number of errors made, $F(3,12) = 7.34$ , $p = 0.005$ .
- This indicates a significant effect, but doesn't specify which groups differ.

The Need for Further Analysis

The F ratio only indicates that there is a difference somewhere between the means.
Further analysis is needed to determine where the specific differences lie.

Approaches to Comparisons

A Priori (Planned) Comparisons

Used when there is a strong theoretical interest in certain groups and specific hypotheses based on evidence.
Focuses on comparing only groups of interest.
Overall ANOVA is done first, progressing to planned comparisons.

Post Hoc Comparisons

Used when you cannot predict exactly which means will differ.
Overall ANOVA is done first to check if the independent variable (IV) has an effect.
Compares all groups to each other to explore differences.
More exploratory and less refined than planned comparisons.

A Priori/Planned Comparisons

Can be simple or complex:
- Simple: Comparing one group to one other group.
- Complex: Comparing a set of groups to another set of groups.
In SPSS, complex comparisons are created by assigning weights to different groups.

Complex Comparison Example

Comparing groups with 8 or 12 hours of sleep deprivation against the group with 20 hours.
Create two sets of weights:
- One for the first set of means.
- One for the second set of means.
- Assign a weight of zero to any remaining groups.

Weight Assignment

Set 1 (e.g., 8h, 12h) gets positive weights.
Set 2 (e.g., 20h) gets negative weights.
The weights must sum to 0.

More Complex Example with 5 Groups

Comparing the first 3 groups with the last 2 groups.
A simple rule that always works: The weight for each group is equal to the number of groups in the other set.
- If comparing Group 1, 2, 3 vs. Group 4, 5:
  - Groups 1, 2, 3 get a weight of 2 (number of groups in Set 2).
  - Groups 4, 5 get a weight of -3 (negative of the number of groups in Set 1).
Total weights must sum to zero (e.g., $2 + 2 + 2 - 3 - 3 = 0$ ).

Planned Comparisons Calculation

Apply the weight to each mean.
Calculate a sum of squares for the contrast.
$MS{contrast} = SS{contrast}$ (because $df = 1$ ).
Perform F-test using the $MS{error}$ from the omnibus ANOVA. $L = ∑aj \bar{X}j$ $SS{contrast} = \frac{nL^2}{σ aj^2}$ $F = \frac{MS{contrast}}{MS_{error}}$

T-test or F-test for Planned Comparisons

Because you are always comparing two sets of means, there is only 1 df for Treatment.
Can be tested using an F-test or a t-test.
SPSS reports this as a t-test.
With 1 df for Treatment, $F = t^2$ or $t = \sqrt{F}$ .

Assumptions of Planned Comparisons

Subject to the same assumptions as the overall ANOVA, particularly homogeneity of variance.
SPSS provides output for homogeneity assumed and homogeneity not assumed.
If homogeneity is not assumed, SPSS adjusts the df of the F critical to control for any inflation of Type 1 error.

Orthogonal Contrasts

Each contrast tests something completely different from the other contrasts.
Principle: Once you have compared one group (e.g., A) with another (e.g., B), you don’t compare them again.
Example: Groups 1, 2, 3, 4
- Contrast 1 = 1, 2 vs 3, 4
- Contrast 2 = 1 vs 2
- Contrast 3 = 3 vs 4

Checking for Orthogonality

Can be checked by drawing a tree diagram.

Cool Things About Orthogonal Contrasts

A set of k-1 orthogonal contrasts (where k is the number of groups) accounts for all of the differences between groups.
A set of k-1 planned contrasts can be performed without adjusting for type-I error rate (according to some authors).

Post Hoc Comparisons

Used when there is a belief that the IV will impact performance but no specific hypothesis about which conditions will differ.
Planned comparisons would not be appropriate in this case.
Perform the overall F analysis first.
- If overall F is non-significant, stop.
- If overall F is significant, perform post-hoc tests to determine where the differences are.

Post Hoc Tests

Seek to compare all possible combinations of means, leading to many pair-wise comparisons.
e.g., With 4 groups, 6 comparisons: 1v2, 1v3, 1v4, 2v3, 2v4, 3v4.

Type I Error Rates

When we find a significant difference, there is an 𝛼 chance that we have made a Type I error.
The more tests we conduct, the greater the Type I error rate.
The error rate per experiment (PE) is the total number of Type 1 errors we are likely to make in conducting all the tests required in our experiment.
PE <= α \times \text{number of tests}

Bonferroni Adjusted α Level

Divide 𝛼 by the number of tests to be conducted (e.g., 0.05/2 = 0.025 if 2 tests are to be conducted).
Assess each follow-up test using this new 𝛼 level (i.e., 0.025).
Maintains PE error at 0.05, but reduces the power of your comparisons.

Other (Less Conservative Corrections)

Several statistical tests systematically compare all means while controlling for Type 1 error.
- LSD - least significant difference (actually no adjustment).
- Tukey’s HSD - Honestly Significant Difference, popular as best balance between control of EW error rate and power (i.e., Type 1 vs. Type 2 error).
- Newman-Keuls: gives more power but less stringent control of EW error rate.
- Scheffe Test: most stringent control of EW error rate as controls for all possible simple and complex contrasts.
Tukey’s test is very common and recommended.

SPSS & Post Hoc Tests

Select ONE-WAY ANOVA from the COMPARE MEANS option in the ANALYZE MENU.
Specify the IV and DV in the usual way.
Select the POST HOC button.
Select the desired post hoc test by clicking on it.
Press the CONTINUE button when all required tests are selected.
Press the OK button to run the analysis.

Interpreting SPSS Output

The asterisks next to mean differences are significant at 𝛼=0.05.
Type 1 error rates are already accounted for here. Can assess significance at 0.05.

Summary: Choosing the Right Approach

If your hypothesis predicts specific differences between means:
- Assess assumptions
- Perform ANOVA
- Consider what comparisons will test your specific hypotheses
- Perform planned comparisons needed to test these predictions
If your hypothesis does not predict specific differences between means:
- Assess assumptions
- Perform ANOVA
- If ANOVA is significant, then perform post-hoc tests
- If ANOVA is not significant, then don’t do post-hoc tests

Effect Size

A significant F simply tells us that there is a difference between means (i.e., that the IV has had some effect on the DV).
It does not tell us how big this difference is or how important this effect is.
An F significant at 0.01 does not necessarily imply a bigger or more important effect than an F significant at 0.05.
Significance of F is dependent on the sample size and the number of conditions which determines the F comparison distribution.
We need a statistic which summarizes the strength of the treatment effect:
- Eta squared ( $\eta^2$ )
- Indicates the proportion of the total variability in the data accounted for by the effect of the IV.

Eta Squared ( $\eta^2$ ) for ANOVA

This result says that 65% of the variability in errors is due to the effect of manipulating sleep deprivation.
While it is the measure of effect size given by SPSS, there are some limitations with $\eta^2$ for use in ANOVA to be aware of;
- It is a descriptive statistic not an inferential statistic so not the best indicator of the effect size in population
- It tends to be an overestimate of the effect size in the population.

$\eta^2 = \frac{SS{between}}{SS{Total}} = \frac{3314.25}{5119.75} = .65$

Criteria for Assessing $\eta^2$

$\eta^2$ may range from 0 to 1
Cohen (1977) proposed the following scale for effect size:
- 0.01 = small effect
- 0.06 = medium effect
- >0.14 = large effect

Interpreting Effect Size

The effect sizes typically observed in psychology may vary from area to area.
The levels of the IV used are important in determining the observed effect size.
A theoretically important IV may still only account for a small proportion of the variability in the data.
A theoretically unimportant IV may account for a large proportion of variability in the data.

Reporting Effect Size

As $\eta^2$ is the effect size given in SPSS, it is the most commonly reported measure.
But as noted it is only a descriptive statistic and tends to overestimate the effect size.
You can report Eta squared with your ANOVA e.g., $F(3, 12) = 7.34, p = .005, \eta^2 = .65$ .
For the full picture (effect size, sample size, error, and alpha) we should also report the MSerror somewhere in the results. In APA this is called MSE. $F(3, 12) = 7.34, p = .005, \eta^2 = .65, MSE = 150.46$ .

Examples of Reporting Results

A one-way independent-samples ANOVA was conducted on the target identification accuracy (number of correct responses) as a function of the amount of sleep deprivation the participant experienced with 4 levels (8, 12, 20, and 28 hours). The analysis showed a significant effect of sleep deprivation, $F(3, 12) = 7.34, p =.005, \eta^2 = .65, MSE = 150.46$ . This indicates that approximately 65% of the variation in accuracy scores can be attributed to changes in sleep deprivation; this is a large effect size.

Examples of Reporting Planned Contrasts

To further explore mean differences, a series of a priori contrasts were formed. Contrast 1 found that accuracy scores in the 12hr condition were not significantly different than the 8hr condition $t(12) = 1.30, p = .219$ . Contrast 2 however, found that accuracy scores in the 28 and 20hr conditions were significantly greater than the 12 and 8hr conditions t(12) = 4.48, p <.001.
The number of planned contrasts should match the number of a-priori planned comparisons hypothesised.

Examples of Reporting Post Hoc Tests

To further explore mean differences, a series of Tukey adjusted post hoc pairwise comparisons were performed. As shown in Table 1, post hoc comparisons revealed significant mean differences between the 28hr and 8hr condition ( $p<.05$ ), as well as 28hr and 12hr conditions ( $p<.05$ ), however all other pairwise comparisons were non-significant ( $p>.05$ ).
You would only do one or the other – either planned, or post-hoc tests. Not both.

Planned Comparisons and Post Hoc Tests & Effect Size

Planned Comparisons and Post Hoc Tests

Learning Goals

Revisiting Sleep Deprivation Example

The Need for Further Analysis

Approaches to Comparisons

A Priori (Planned) Comparisons

Post Hoc Comparisons

A Priori/Planned Comparisons

Complex Comparison Example

Weight Assignment

More Complex Example with 5 Groups

Planned Comparisons Calculation

T-test or F-test for Planned Comparisons

Assumptions of Planned Comparisons

Orthogonal Contrasts

Checking for Orthogonality

Cool Things About Orthogonal Contrasts

Post Hoc Comparisons

Post Hoc Tests

Type I Error Rates

Bonferroni Adjusted α Level

Other (Less Conservative Corrections)

SPSS & Post Hoc Tests

Interpreting SPSS Output

Summary: Choosing the Right Approach

Effect Size

Eta Squared (η2\eta^2η2) for ANOVA

Criteria for Assessing η2\eta^2η2

Interpreting Effect Size

Reporting Effect Size

Examples of Reporting Results

Examples of Reporting Planned Contrasts

Examples of Reporting Post Hoc Tests

Eta Squared ( $\eta^2$ ) for ANOVA

Criteria for Assessing $\eta^2$