Transformations and Post Hoc test

Handling Violated Assumptions in ANOVA

When ANOVA assumptions (normality, equal variance) are violated, transformations can be applied to the raw data.
Mathematical transformations preserve the relationships between observations while attempting to meet ANOVA assumptions.
If transformations fail, one might proceed with raw data, noting the caveat that assumptions were not met.
Non-parametric alternatives to ANOVA exist (e.g., Kruskal-Wallis), but they are limited to simple designs (one-way ANOVAs).

For skewed data (negatively or positively), a log transformation is often the first choice.
- log(x)
- Constants can be added, particularly when raw data contains zeros. Using 1 as the constant log(x+1)
The square root transformation is a harsher option if the log transformation is insufficient.
- \sqrt{x}
- Small constants can be added, but are not typically used. \sqrt{x+k}
For proportion data (percentages), use the arcsine transformation directly on decimal values.
- arcsin(\sqrt{p}) where p is the proportion (e.g., 0.5 for 50%).

Choose a transformation based on data exploration (histograms, etc.).
Re-run the ANOVA on the transformed data.
Test the assumptions again using diagnostics.
If assumptions are still not met, repeat with a different transformation or consider alternative approaches.

ANOVAs are robust against departures from normality, depending on the extent of the departure.
Violations of equal variance are more problematic.

Before 2000, data transformation was a very common approach.
Since the early 2000s, generalized linear models (GLMs) have become more prevalent, allowing for the use of other distributions besides the normal distribution.
A combination of both approaches is often seen in current practice.

A rank-based test that does not assume a specific distribution.
Less powerful than parametric tests (lower ability to detect a difference).
Limited to simple one-way ANOVA designs.
P-values from Kruskal-Wallis tests may differ from parametric ANOVA results due to assumption violations or lower power.

Post-hoc tests are conducted only if the ANOVA model reveals a significant difference to determine what is driving the difference.

Involve conducting multiple pairwise comparisons (t-tests) between group means.
For example, with four treatment levels, the comparisons are 1 vs 2, 1 vs 3, 1 vs 4, 2 vs 3, 2 vs 4, and 3 vs 4.

Looking at the difference in means between two groups divided by some measure of the variation.
t = \frac{\bar{Y2} - \bar{Y1}}{\sqrt{MSE(\frac{1}{n1} + \frac{1}{n2})}}

Post-hoc testing can be approached statistically or graphically (or a combination).
A graphical approach involves examining means and confidence intervals.
If 95% confidence intervals overlap, there is no significant difference between those groups.
If confidence intervals do not overlap, there is a significant difference.

When p-values are close to 0.05, confidence intervals might overlap, leading to conflicting conclusions between statistical and graphical approaches.
Graphical approaches can be more conservative.

As the number of pairwise comparisons increases, the chance of committing a Type I error (false positive) also increases.
Family-wise error rate (FWER) is the probability of making at least one Type I error among all the comparisons.
To control FWER, adjust p-values using methods like Bonferroni correction (set a smaller p-value).

Controls for the family-wise error rate.
There is a trade-off: the more FWER is controlled, the less power there is to detect a real difference.
The e-means package in R can be used to run Tukey's test and generate plots.
Base R also has a TukeyHSD function for performing Tukey's test.
- The output includes the difference in means, upper and lower confidence intervals, and adjusted p-values.

Exploration:
- Create box plots to visualize differences and skewness.
ANOVA Model:
- Fit an ANOVA model to the data.
Check Assumptions:
- Assess normality using residual plots and formal tests.
- Assess equal variance using Bartlett's test or fitted vs. residual plots.
Post-Hoc Tests:
- If ANOVA is significant, perform post-hoc tests to determine which groups differ.
- Use graphical approaches (e.g., confidence interval plots) to visualize differences.
- Use Tukey's test for pairwise comparisons with FWER control.