Week 12: Non-Parametric Analyses
Introduction to Non-Parametric Analyses
- All statistical tests have assumptions, especially distributional assumptions about numeric data like the shape and variance of distribution.
- Regression and ANOVA analyses are considered "robust" to mild to moderate violations of distributional assumptions.
- Larger sample sizes and equal group, cell, or condition sizes increase robustness.
- However, robustness shouldn't be relied upon with highly non-normal data, small sample sizes, and very uneven group sizes.
- Traditional parametric analyses are generally preferable due to their sensitivity and statistical power, provided assumptions aren't severely violated.
- Non-parametric analyses are also called distribution-free analyses: they don’t require (expect) certain kinds of distributions. They apply to categorical & numeric data.
- Categorical data:
- Chi-square goodness of fit test: a single categorical variable
- Chi-square test of independence: two categorical variables, independent design
- McNemar’s test: two categorical variables, related/paired design
- Non-normal numeric or ordinal data:
- Spearman’s correlation: equivalent of Pearson’s correlation
- Wilcoxon-Mann-Whitney rank sum test: equivalent of the independent samples t-test
- Wilcoxon signed-rank test: equivalent of paired t-test
- Kruskal-Wallis test: equivalent of one-way between-subjects ANOVA
- Friedman Test: equivalent of one-way repeated-measures ANOVA
Numeric and Ordinal Data
- Non-parametric tests rank numerical data and perform analyses on the ranks, addressing distributional problems and outliers.
- Ranking involves ordering data from lowest to highest.
- Downside of ranking:
- Loss of information about the degree of difference since the difference between 18 and 19 (1 point) is treated the same as the difference between 21 and 35 (14 points)
- However, this is beneficial for skewed distributions and those with outliers
- Tied ranks: When data contains the same numbers, assign equal ranks by averaging their potential ranks:
- If the raw data had some people of the same age (19).
Spearman's Correlation (Counterpart of Pearson's Correlation)
- Use Spearman’s correlation in situations:
- One or both variables are ordinal
- Variable(s) are severely skewed
- The relationship is non-linear but monotonic (consistent direction between the variables)
- Spearman’s Correlation in Stata
- Syntax: spearman var1 var2
- Example: There is a very strong, r_s (23) = .83, and statistically significant correlation, p < .001. The more someone’s heart flutters with excitement at “Design and Stats”, the more time they spend thinking about stats.
- Very similar in implementation and interpretation to Pearson’s correlation!
- Write up is the same as Pearson’s correlation (except the “s” subscript).
- Similar to Pearson’s correlation, Spearman’s correlation, denoted as r_s or ρ (rho), also ranges from 0 to 1 or -1 to 0, with 1 or -1 indicating a perfect relationship
- Same rules of thumb as Pearson’s correlation (Cohen, 1988):
- Between 0.1 and 0.3 = small effect
- Between 0.3 and 0.5 = medium effect
- > 0.5 = large effect
- Difference: Spearman’s correlation works by ranking the scores (lowest to highest) for each variable, and computing the correlation on the ranked scores
Mann-Whitney-Wilcoxon Rank Sum Test (Two Independent Groups; Counterpart of the Independent-Samples T-Test)
- Also known as:
- Mann-Whitney U
- Wilcoxon rank-sum Ws
- Example: Drug Use and Depression
- Research question: Is there a difference in the experiences of depression in people who use different recreational drugs when clubbing?
- Procedure: Drugs were taken on a Saturday night while clubbing. Depression scores were measured twice, on Sunday night and then again on Wednesday.
- DV: depression (BDI) measured 1 day after drug use (Sunday)
- IV: Drug (alcohol vs. ecstasy), between-subjects categorical variable
- Design: Single-IV between-groups design with two levels
- Because:
- Small sample size
- Outliers
- Non-normal data
- Not-so-equal variances
- So, we use the Rank Sum test instead
- Stata syntax: ranksum DV, by (IV)
- egen SunBDIRank = rank(SundayBDI)
- H0: ranks (median) between the two groups didn’t differ
- If p < .05 reject H0, indicating stat. sig.; Two p-values here; report the exact p.
- Effect size:
- r = \frac{z}{\sqrt{N}}
- Sunday Depression scores: r = \frac{1.105}{\sqrt{20}} = 0.25
- Same rules of thumb as correlation (Cohen, 1988):
- Between 0.1 and 0.3 = small effect
- Between 0.3 and 0.5 = medium effect
- > 0.5 = large effect
- Conclude: The day after a night of clubbing, depression scores do not significantly differ between those who drank alcohol (mean rank = 9.1) and those who took ecstasy (mean rank = 12.0), z = -1.11, p = .288, with a weak effect size, r = .25.
- Not to be confused with the Wilcoxon rank-sum test
- Example: Drug Use and Depression
- Research question: Have people’s experiences with depression changed after one day of using drugs while clubbing, compared to four days later? But here, let’s focus on Alcohol.
- Procedure: Drugs were taken on a Saturday night while clubbing. Depression scores were measured twice, on Sunday night and then again on Wednesday.
- DV: depression (BDI)
- IV: Day (1 day after alcohol use, Sunday; 4 days after alcohol use), within-subjects categorical variable
- Design: A within-group design with a single factor comprising two levels
- Normality of the differences (between the conditions/levels)
- Not normal difference scores,Small N (10 pairs of scores)
- Stata syntax: signrank var1 = var2
- Sort diff_rank Ignoring the signs; Compare the ranks of the diff. associated with + vs. - sign. H0: No changes in the ranks of differences. Rank the absolute diff.
- Report the exact p.
- Effect size:
- r = \frac{z}{\sqrt{N}}
- Pairs of obs.
- r = \frac{1.99}{\sqrt{10}} = 0.63 (large effect)
- When reporting, can include the median scores from each variable, or the sum of the positive and negative ranks
- Conclude: A sign rank test demonstrated that, after consuming alcohol and clubbing on Saturday night, depression scores significantly decreased from Sunday (Median = 16) to Wednesday (Median = 7.5), z = -1.99, p = .045, with a large effect size, r = .63
Kruskal-Wallis Test (Multiple Independent Groups; Counterpart of the Between-Subjects One-Way ANOVA)
- Used for violations of distributional assumptions, especially if coupled with small sample sizes and/or uneven group sizes
- Extension of Previous Study:
- Let’s say someone wanted to extend the previous study, rather than just looking at ecstasy vs alcohol, they wanted to explore ecstasy, alcohol, and water-only drinkers.
- 100 people were recruited: 10 were ecstasy users, 60 were alcohol drinkers, and 30 were water-only drinkers
- Dataset: “drug3groups.dta”, with a categorical variable “Drug” and a numerical variable “SundayBDI”
- kwallis DV, by(IV)
- Like the Rank Sum test, it ranks the DV and then totals the ranks for different groups
- test statistic: χ2
- p < .05 reject H0, indicating there’s some difference in the mean ranks among the groups
- Need to follow up with group comparisons!
- H0: Ranks (median) among groups are the same
- We use Dunn’s test of multiple comparisons to follow up on a statistically significant Kruskal-Wallis test:
- Syntax: dunntest DV, by(IV) ma(bonferroni)
- It conducts pair-wise comparison, just like running a Rank-sum test on each pair of groups
- Need to adjust family-wise error rate using different adjustments to prevent inflation of Type I error (like ANOVA)
- Effect Size
- Calculating the r effect size
- r = \frac{z}{\sqrt{N}}
- N for the two groups being compared
- For example, between Ecstasy and alcohol: r = \frac{1.09}{\sqrt{70}} = 0.13
- There’s no significant difference in depression scores between alcohol and ecstasy, z = 1.09, p = .413, with a small effect size, r = .13.
- There is a significant difference between ecstasy and water drinkers, z = 4.70, p < .001, r = .74, and alcohol and water drinkers, z = 6.00, p < .001, r = .63, in that the mean ranks of depression were lower in water-only drinkers than both other groups, both with large sized differences.
- There is also non-parametric equivalent to one-way repeated measures ANOVA: Friedman’s Test. However, the test itself has yet to be elegantly implemented in Stata
- Ranking loses information on the nuances of the scale and differences
- If the assumptions are met, yes, parametric analyses are more powerful (more likely to detect an effect)
- But if assumptions are particularly badly violated, non-parametric tests are often more powerful, and provide us more reliable p-value.
- Often, you’ll get the same results/conclusion from equivalent parametric and non-parametric tests (which doesn’t mean to try both and pick the preferred outcome!).
- These non-parametric analyses are pretty simple, with no equivalent for more complex designs
- Ways of adjusting estimations within the parametric analyses (e.g. bootstrapping)
- Ways to transform the data
- More complex computational estimations (e.g. simulations)
- Again, if and when we can (which is the majority of the time), parametric tests (t-tests, regression, ANOVAs) are better choices for analysing our data; however, it is useful to know when to use non-parametric tests and why, sometimes, people use them
Conclusions
- Non-parametric analyses don’t have expectations about shapes of distributions, because they:
- Apply to categorical data, and therefore, distributional shapes are irrelevant! or
- Apply to non-normal or ordinal data, and rank the data before analysing
- When it comes to numeric data that doesn’t meet distributional assumptions (or violations to distributional assumptions are coupled with other issues, e.g. small sample sizes or unequal groups), non-parametric analyses are an option
- Despite being different families of analyses and different methods of analysing, similar practice:
- Understand the variables and the research question
- Numerically and graphically describe the data
- Conduct the analysis, and if necessary, make adjustments for multiple comparisons.
- Interpret the results.
- After this week’s lecture, you know:
- What defines non-parametric analyses, and when they are appropriate to use
- Spearman’s correlation, rank sum test, sign rank test, Kruskal-Wallis test
- How to interpret the results of these tests
- The strengths and limitations of non-parametric tests
- In Stata, you should be able to:
- Conduct the tests covered
- Graph data in an appropriate way for the type of data (and its distribution)