Assumptions of ANOVA

ANOVA Assumptions

Introduction

By the end of this lecture, you should be able to:

  • Enumerate the assumptions of the Analysis of Variance.
  • Identify ways to check (or even test) the assumptions.
  • Identify common remedies where there are breaches of assumptions.

Recap from last week:

  • A significant ANOVA F-test implied that the between-groups differences were greater than within-groups differences. But this is not always the case, given that hypothesis testing is based on probability, not certainty.

Errors in Hypothesis Testing

  • The process of hypothesis testing is based on probability, not certainty, thus errors can occur.
  • There is an α\alpha chance of getting an F larger than F<em>criticalF<em>{critical} when H</em>0H</em>0 is true.
  • α\alpha% of the time, we will incorrectly reject H0H_0.
    • This is the Type I error rate, always equals α\alpha.
  • There is a probability of getting an F value less than F<em>criticalF<em>{critical} when H</em>0H</em>0 is false.
    • This is the Type II error rate, signified by beta β\beta.
  • Lowering the Type I error rate (e.g., by lowering α\alpha) increases the Type II error rate (and vice versa).
Errors in Hypothesis Testing: Revision
DecisionResearch Hypothesis (H<em>1H<em>1) is false, H</em>0H</em>0 is TrueResearch Hypothesis (H<em>1H<em>1) is true, H</em>0H</em>0 is false
Support the H<em>1H<em>1; Reject H</em>0H</em>0Incorrect decision. Type I Error Probability= α\alphaCorrect decision Probability = 1-β\beta = power
Study is inconclusive. Don't support H1H_1Correct decision Probability = 1- α\alphaIncorrect decision. Type II Error probability= β\beta

Design issues can lead to Type I and Type II errors. The chances of making those errors are affected by whether our data meet the mathematical assumptions underlying the analysis.

ANOVA Assumptions

  1. DV should be measured on a metric scale
  2. Independence of observations
  3. Normality of distributions
  4. Homogeneity of variance
The Independence Assumption
  • States that it is not possible to predict one score in the data from any other score.
  • In a between-groups design, this assumption is met by adequate experimental design:
    • Random assignment of participants to groups (levels of IV).
    • Random selection of participants from the population/s of interest (particularly important with some types of IV where random allocation is impossible).
    • Each participant contributes only 1 score to the analysis (this may be the mean of many observations).
    • Each participant’s score is independent – i.e., not influenced by any other participant’s score.
The Normality Assumption
  • States that:

    • the samples are drawn from normally distributed populations and
    • the error component is normally distributed within each treatment group (level of IV).
  • ANOVA is robust to breaches of this assumption (e.g., Hsu & Feldt, 1969) provided that:

    • There are a similar number of participants in each condition.
    • There are at least 10-12 participants in each condition.
    • The departure from normality (skewness or kurtosis) is similar in each condition (Kirk, 1982).
  • To see whether this assumption is breached, inspect frequency histograms for each experimental condition.

  • Compute skewness and kurtosis statistics.

    • Skewness
      • 0: normally distributed data (or any symmetrically distributed data)
      • Positive/negative values: distribution skewed positively/negatively
    • Kurtosis
      • 0: normally distributed data (or any distributions that don’t have more outliers than normal distributions)
      • In mathematical terms Kurtosis should equal 3 for a normal distribution, however SPSS subtracts the 3 to give 0. Be aware of this when you are moving to different software packages
  • Each of these statistics comes with a standard error (which represents the sampling variability of the statistic). The statistic divided by its standard error distributes as a Z-score

    Z<em>skew=SkewnessSE</em>skewnessZ<em>{skew} = \frac{Skewness}{SE</em>{skewness}}

  • Test this against a critical value of Z at a conservative criterion (α\alpha = .01 or .001).

  • Alternatively, use a cutoff like ±2 or ±3 as a guide.

  • For larger sample sizes, it is better to use visual inspection.

  • Sometimes transformation of data can fix problems with normality, discussed later when we look at the homogeneity assumption. But remember in most cases, ANOVA is robust to breaches of normality.

Outliers
  • Of more potential impact on our statistics than the shape of our distribution per se is the problem of outliers.
  • An outlier is an extreme score at one or both ends of our distribution.
  • Can inflate our measures of variance.
  • Can also affect our mean.
  • Potentially caused by spurious (incorrect) data – e.g., eyeblink on the part of the participant – Error in the procedure.
  • First decide why it is an outlier… it may reflect an out-of-range value or a participant who is not part of targeted populations.
  • Some solutions to problems of outliers:
    • Remove them from data (common, but potential problematic solution)
    • Transform data to remove the influence of outliers
    • Use a non-parametric test (e.g., based on ranked data)
    • Run analysis with and without outliers and see if they affect your results. If not report this and report ANOVA as usual
Homogeneity of Variance
  • As MSwithinMS_{within} is a pooled error term, we need to ensure that the variance within each of treatment conditions/groups is similar.
  • A rule of thumb is that the largest variance should be no more than 4 times the smallest variance (Howell, 2013).
  • Breaches of homogeneity can affect the Type I error rate.
  • Breaches of the homogeneity assumption are compounded by very unequal group sizes.
  • There are a number of tests for breaches of homogeneity - e.g., Levene’s Test provided in SPSS.
  • To test the homogeneity assumption in SPSS:
    • Run the ANOVA in the normal way except:
    • Click the “Options” button.
    • Select the “Homogeneity of variance test” box.
    • Click “Continue” to exit the options box then click OK to run the analysis.
    • A significant Levene’s test (< .05) means variance in each group is significantly different and the homogeneity assumption is breached.
  • Ways of dealing with breaches of the homogeneity assumption:
    1. If you have equal group sizes and the breach is minor (i.e., largest group variance < 4 × smallest), you can run an ANOVA as it is robust to minor breaches of homogeneity.
    2. Run the ANOVA but use a lower alpha level to control for the possible impact on the Type I error rate.
    3. Use an alternate statistical test which does not have the homogeneity assumption (e.g., non-parametric test).
    4. Transform the data to remove the heterogeneity and run the ANOVA on the transformed data.
    5. Perform a robust test (e.g., Welch test or Brown-Forsythe test).
Lowering the α\alpha Level
  • Breaches of homogeneity in ANOVA cause overestimation of the true value of F.
  • This means we make more Type I errors.
  • Lowering the alpha level (e.g., .025 rather than .05) reduces the Type I error rate.
  • Thus, the effect of the breach of homogeneity can be reduced by using a lower alpha level.
Distribution-Free Tests
  • Tests like t and F, which make assumptions about the distribution of scores, are called parametric tests.
  • Some other tests have less restrictive assumptions about the distributions used. These are called non-parametric tests or distribution-free tests.
  • Most of these work by converting each score to a rank.
  • Ranks are spread out evenly so the shape of the distribution will always be rectangular (= no normality assumption and no problem with outliers).
  • There are specific rank-order tests for various hypothesis-testing situations.
  • In rank order tests we are comparing ranks rather than scores.
  • Some also compare medians rather than means for describing group differences.
Parametric testNon-parametric test
Independent samples tWilcoxon rank-sum or Mann-Whitney U test
Repeated-measures tWilcoxon signed-ranks test
Independent samples ANOVAKruskal-Wallis H test
Repeated-measures ANOVAFriedman test
Pearson correlationSpearman’s rho

There are non-parametric alternatives to almost all of our statistical tests.

Kruskal-Wallis One-Way ANOVA
  • Performing Kruskal-Wallis test with SPSS
    • Create data file as usual.
    • Select NONPARAMETRIC TESTS from the ANALYZE menu
    • Select Legacy Dialogs and K Independent Samples
    • Select the IV (Grouping Variable) and define groups and DV (test variable) in the same way as you run one-way ANOVA
    • Select “Define Range” to indicate the range of the IV
  • A significant chi-square indicates a significant difference between groups.
  • To report: χ2(3)=9.37,p=.025\chi^2(3) = 9.37, p = .025
Data Transformations
  • Data transformations involve performing a mathematical operation on all the scores on your DV.
  • These transformations change the shape of the distribution of scores.
  • A suitable transformation can:
    • Reduce heterogeneity of variance.
    • Achieve normality.
  • Some types of common transformations:
    • Square-root (positive skew)
    • Logarithm (strong skew)
    • Reciprocal (extreme skew)
    • Trimmed samples (outliers)
  • Steps in doing a transform:
    1. Identify problem (heterogeneity or skewness).
    2. Find the transformation which minimizes this problem. Check the assumption again on the transformed data.
    3. Do not look for the transform which maximizes F, but the one that minimizes the assumption breach
    4. Perform ANOVA using these transformed scores as the DV.
    5. You can run ANOVA on transformed and original data, and if you get the same result, report the original data as they are easier to interpret.
Bootstrap Techniques
  • As computers become more powerful and can generate multiple estimates instantly, they can overcome many of the restrictive assumptions that are derived from our statistics being based on mathematical models and hypothetical distributions.
  • It creates multiple resamples (with replacement) from a single set of observations and computes different statistics (commonly just the mean) of interest on each of these resamples. The bootstrap resamples can then be used to determine the bootstrapped 95% CI.

Summary - What you need to know:

  • ANOVA has four assumptions (Metric scaling, Independence, Normality, and Homogeneity of Variance).
  • How to check assumptions.
  • Impact of the breach of the assumptions on your ANOVA results.
  • Main ways you can deal with breaches.
  • How to report your assumption checks and how you have dealt with any breaches.
  • Independence assumption is a design issue, but normality and homogeneity assumptions cannot be totally controlled by good design.
  • So, when performing an ANOVA, always check normality and homogeneity.
  • If normality breached:
    • Ask whether the breach tells you something important about the data (e.g., bi-modality may mean that the IV has different effects for different people, or outliers may indicate sampling issues).
    • ANOVA is robust against breaches of normality provided group sizes are reasonably equal, n in each group is not too small, and skewness is in a similar direction in each group. Under these conditions, we can run ANOVA
    • If we have small or uneven group sizes we can transform data and run ANOVA with transformed data or run a non-parametric test
Reporting (Example)

Prior to performing a One-way Analysis of Variance, the key assumptions were explored. Normality was explored by inspecting histograms of Typing Speed, at each level of Keyboard Type. Levene’s test of homogeneity of variance indicated however that variances around group means were significantly different F(2, 27) = 4.18, p = .026. Due to this issue, the α\alpha rate for testing the significance of F, was reduced from .05 to .025, in order to control for Type 1 error rate.

Next Week

  • Follow-up test for an omnibus ANOVA
  • Issues of power and effect size.