Planned Comparisons and Post Hoc Tests & Effect Size

Planned Comparisons and Post Hoc Tests

Learning Goals

  • Describe major approaches to following up mean differences in an ANOVA.
  • Set up and analyze planned comparisons between group means.
  • Set up and analyze post-hoc comparisons between group means.
  • Report ANOVA comparisons and effect size.

Revisiting Sleep Deprivation Example

  • Target identification accuracy as a function of sleep deprivation:
    • Table includes sleep deprivation in hours (28, 20, 12, 8), with multiple data points for each.
    • Shows sum (Σ), mean (M), and standard deviation (S) for each sleep deprivation level.
  • ANOVA Summary:
    • Sleep deprivation had a significant effect on the number of errors made, F(3,12) = 7.34, p = 0.005.
    • This indicates a significant effect, but doesn't specify which groups differ.

The Need for Further Analysis

  • The F ratio only indicates that there is a difference somewhere between the means.
  • Further analysis is needed to determine where the specific differences lie.

Approaches to Comparisons

A Priori (Planned) Comparisons

  • Used when there is a strong theoretical interest in certain groups and specific hypotheses based on evidence.
  • Focuses on comparing only groups of interest.
  • Overall ANOVA is done first, progressing to planned comparisons.

Post Hoc Comparisons

  • Used when you cannot predict exactly which means will differ.
  • Overall ANOVA is done first to check if the independent variable (IV) has an effect.
  • Compares all groups to each other to explore differences.
  • More exploratory and less refined than planned comparisons.

A Priori/Planned Comparisons

  • Can be simple or complex:
    • Simple: Comparing one group to one other group.
    • Complex: Comparing a set of groups to another set of groups.
  • In SPSS, complex comparisons are created by assigning weights to different groups.

Complex Comparison Example

  • Comparing groups with 8 or 12 hours of sleep deprivation against the group with 20 hours.
  • Create two sets of weights:
    • One for the first set of means.
    • One for the second set of means.
    • Assign a weight of zero to any remaining groups.

Weight Assignment

  • Set 1 (e.g., 8h, 12h) gets positive weights.
  • Set 2 (e.g., 20h) gets negative weights.
  • The weights must sum to 0.

More Complex Example with 5 Groups

  • Comparing the first 3 groups with the last 2 groups.
  • A simple rule that always works: The weight for each group is equal to the number of groups in the other set.
    • If comparing Group 1, 2, 3 vs. Group 4, 5:
      • Groups 1, 2, 3 get a weight of 2 (number of groups in Set 2).
      • Groups 4, 5 get a weight of -3 (negative of the number of groups in Set 1).
  • Total weights must sum to zero (e.g., 2 + 2 + 2 - 3 - 3 = 0).

Planned Comparisons Calculation

  • Apply the weight to each mean.
  • Calculate a sum of squares for the contrast.
  • MS{contrast} = SS{contrast} (because df = 1).
  • Perform F-test using the MS{error} from the omnibus ANOVA. L = ∑aj \bar{X}j SS{contrast} = \frac{nL^2}{σ aj^2} F = \frac{MS{contrast}}{MS_{error}}

T-test or F-test for Planned Comparisons

  • Because you are always comparing two sets of means, there is only 1 df for Treatment.
  • Can be tested using an F-test or a t-test.
  • SPSS reports this as a t-test.
  • With 1 df for Treatment, F = t^2 or t = \sqrt{F}.

Assumptions of Planned Comparisons

  • Subject to the same assumptions as the overall ANOVA, particularly homogeneity of variance.
  • SPSS provides output for homogeneity assumed and homogeneity not assumed.
  • If homogeneity is not assumed, SPSS adjusts the df of the F critical to control for any inflation of Type 1 error.

Orthogonal Contrasts

  • Each contrast tests something completely different from the other contrasts.
  • Principle: Once you have compared one group (e.g., A) with another (e.g., B), you don’t compare them again.
  • Example: Groups 1, 2, 3, 4
    • Contrast 1 = 1, 2 vs 3, 4
    • Contrast 2 = 1 vs 2
    • Contrast 3 = 3 vs 4

Checking for Orthogonality

  • Can be checked by drawing a tree diagram.

Cool Things About Orthogonal Contrasts

  • A set of k-1 orthogonal contrasts (where k is the number of groups) accounts for all of the differences between groups.
  • A set of k-1 planned contrasts can be performed without adjusting for type-I error rate (according to some authors).

Post Hoc Comparisons

  • Used when there is a belief that the IV will impact performance but no specific hypothesis about which conditions will differ.
  • Planned comparisons would not be appropriate in this case.
  • Perform the overall F analysis first.
    • If overall F is non-significant, stop.
    • If overall F is significant, perform post-hoc tests to determine where the differences are.

Post Hoc Tests

  • Seek to compare all possible combinations of means, leading to many pair-wise comparisons.
  • e.g., With 4 groups, 6 comparisons: 1v2, 1v3, 1v4, 2v3, 2v4, 3v4.

Type I Error Rates

  • When we find a significant difference, there is an 𝛼 chance that we have made a Type I error.
  • The more tests we conduct, the greater the Type I error rate.
  • The error rate per experiment (PE) is the total number of Type 1 errors we are likely to make in conducting all the tests required in our experiment.
  • PE <= α \times \text{number of tests}

Bonferroni Adjusted α Level

  • Divide 𝛼 by the number of tests to be conducted (e.g., 0.05/2 = 0.025 if 2 tests are to be conducted).
  • Assess each follow-up test using this new 𝛼 level (i.e., 0.025).
  • Maintains PE error at 0.05, but reduces the power of your comparisons.

Other (Less Conservative Corrections)

  • Several statistical tests systematically compare all means while controlling for Type 1 error.
    • LSD - least significant difference (actually no adjustment).
    • Tukey’s HSD - Honestly Significant Difference, popular as best balance between control of EW error rate and power (i.e., Type 1 vs. Type 2 error).
    • Newman-Keuls: gives more power but less stringent control of EW error rate.
    • Scheffe Test: most stringent control of EW error rate as controls for all possible simple and complex contrasts.
  • Tukey’s test is very common and recommended.

SPSS & Post Hoc Tests

  • Select ONE-WAY ANOVA from the COMPARE MEANS option in the ANALYZE MENU.
  • Specify the IV and DV in the usual way.
  • Select the POST HOC button.
  • Select the desired post hoc test by clicking on it.
  • Press the CONTINUE button when all required tests are selected.
  • Press the OK button to run the analysis.
Interpreting SPSS Output
  • The asterisks next to mean differences are significant at 𝛼=0.05.
  • Type 1 error rates are already accounted for here. Can assess significance at 0.05.

Summary: Choosing the Right Approach

  • If your hypothesis predicts specific differences between means:
    • Assess assumptions
    • Perform ANOVA
    • Consider what comparisons will test your specific hypotheses
    • Perform planned comparisons needed to test these predictions
  • If your hypothesis does not predict specific differences between means:
    • Assess assumptions
    • Perform ANOVA
    • If ANOVA is significant, then perform post-hoc tests
    • If ANOVA is not significant, then don’t do post-hoc tests

Effect Size

  • A significant F simply tells us that there is a difference between means (i.e., that the IV has had some effect on the DV).
  • It does not tell us how big this difference is or how important this effect is.
  • An F significant at 0.01 does not necessarily imply a bigger or more important effect than an F significant at 0.05.
  • Significance of F is dependent on the sample size and the number of conditions which determines the F comparison distribution.
  • We need a statistic which summarizes the strength of the treatment effect:
    • Eta squared (\eta^2)
    • Indicates the proportion of the total variability in the data accounted for by the effect of the IV.

Eta Squared (\eta^2) for ANOVA

  • This result says that 65% of the variability in errors is due to the effect of manipulating sleep deprivation.
  • While it is the measure of effect size given by SPSS, there are some limitations with \eta^2 for use in ANOVA to be aware of;
    • It is a descriptive statistic not an inferential statistic so not the best indicator of the effect size in population
    • It tends to be an overestimate of the effect size in the population.

\eta^2 = \frac{SS{between}}{SS{Total}} = \frac{3314.25}{5119.75} = .65

Criteria for Assessing \eta^2

  • \eta^2 may range from 0 to 1
  • Cohen (1977) proposed the following scale for effect size:
    • 0.01 = small effect
    • 0.06 = medium effect
    • >0.14 = large effect

Interpreting Effect Size

  • The effect sizes typically observed in psychology may vary from area to area.
  • The levels of the IV used are important in determining the observed effect size.
  • A theoretically important IV may still only account for a small proportion of the variability in the data.
  • A theoretically unimportant IV may account for a large proportion of variability in the data.

Reporting Effect Size

  • As \eta^2 is the effect size given in SPSS, it is the most commonly reported measure.
  • But as noted it is only a descriptive statistic and tends to overestimate the effect size.
  • You can report Eta squared with your ANOVA e.g., F(3, 12) = 7.34, p = .005, \eta^2 = .65.
  • For the full picture (effect size, sample size, error, and alpha) we should also report the MSerror somewhere in the results. In APA this is called MSE. F(3, 12) = 7.34, p = .005, \eta^2 = .65, MSE = 150.46.

Examples of Reporting Results

  • A one-way independent-samples ANOVA was conducted on the target identification accuracy (number of correct responses) as a function of the amount of sleep deprivation the participant experienced with 4 levels (8, 12, 20, and 28 hours). The analysis showed a significant effect of sleep deprivation, F(3, 12) = 7.34, p =.005, \eta^2 = .65, MSE = 150.46. This indicates that approximately 65% of the variation in accuracy scores can be attributed to changes in sleep deprivation; this is a large effect size.

Examples of Reporting Planned Contrasts

  • To further explore mean differences, a series of a priori contrasts were formed. Contrast 1 found that accuracy scores in the 12hr condition were not significantly different than the 8hr condition t(12) = 1.30, p = .219. Contrast 2 however, found that accuracy scores in the 28 and 20hr conditions were significantly greater than the 12 and 8hr conditions t(12) = 4.48, p <.001.
  • The number of planned contrasts should match the number of a-priori planned comparisons hypothesised.

Examples of Reporting Post Hoc Tests

  • To further explore mean differences, a series of Tukey adjusted post hoc pairwise comparisons were performed. As shown in Table 1, post hoc comparisons revealed significant mean differences between the 28hr and 8hr condition (p
  • You would only do one or the other – either planned, or post-hoc tests. Not both.