Week 12: Non-Parametric Analyses

Introduction to Non-Parametric Analyses

  • All statistical tests have assumptions, especially distributional assumptions about numeric data like the shape and variance of distribution.
  • Regression and ANOVA analyses are considered "robust" to mild to moderate violations of distributional assumptions.
    • Larger sample sizes and equal group, cell, or condition sizes increase robustness.
    • However, robustness shouldn't be relied upon with highly non-normal data, small sample sizes, and very uneven group sizes.
  • Traditional parametric analyses are generally preferable due to their sensitivity and statistical power, provided assumptions aren't severely violated.
  • Non-parametric analyses are also called distribution-free analyses: they don’t require (expect) certain kinds of distributions. They apply to categorical & numeric data.
    • Categorical data:
      • Chi-square goodness of fit test: a single categorical variable
      • Chi-square test of independence: two categorical variables, independent design
      • McNemar’s test: two categorical variables, related/paired design
    • Non-normal numeric or ordinal data:
      • Spearman’s correlation: equivalent of Pearson’s correlation
      • Wilcoxon-Mann-Whitney rank sum test: equivalent of the independent samples t-test
      • Wilcoxon signed-rank test: equivalent of paired t-test
      • Kruskal-Wallis test: equivalent of one-way between-subjects ANOVA
      • Friedman Test: equivalent of one-way repeated-measures ANOVA

Numeric and Ordinal Data

  • Non-parametric tests rank numerical data and perform analyses on the ranks, addressing distributional problems and outliers.
  • Ranking involves ordering data from lowest to highest.
  • Downside of ranking:
    • Loss of information about the degree of difference since the difference between 18 and 19 (1 point) is treated the same as the difference between 21 and 35 (14 points)
  • However, this is beneficial for skewed distributions and those with outliers
  • Tied ranks: When data contains the same numbers, assign equal ranks by averaging their potential ranks:
    • If the raw data had some people of the same age (19).

Spearman's Correlation (Counterpart of Pearson's Correlation)

  • Use Spearman’s correlation in situations:
    • One or both variables are ordinal
    • Variable(s) are severely skewed
    • The relationship is non-linear but monotonic (consistent direction between the variables)
  • Spearman’s Correlation in Stata
    • Syntax: spearman var1 var2
    • Example: There is a very strong, r_s (23) = .83, and statistically significant correlation, p < .001. The more someone’s heart flutters with excitement at “Design and Stats”, the more time they spend thinking about stats.
    • Very similar in implementation and interpretation to Pearson’s correlation!
    • Write up is the same as Pearson’s correlation (except the “s” subscript).
    • Similar to Pearson’s correlation, Spearman’s correlation, denoted as r_s or ρ (rho), also ranges from 0 to 1 or -1 to 0, with 1 or -1 indicating a perfect relationship
    • Same rules of thumb as Pearson’s correlation (Cohen, 1988):
      • Between 0.1 and 0.3 = small effect
      • Between 0.3 and 0.5 = medium effect
      • > 0.5 = large effect
    • Difference: Spearman’s correlation works by ranking the scores (lowest to highest) for each variable, and computing the correlation on the ranked scores

Mann-Whitney-Wilcoxon Rank Sum Test (Two Independent Groups; Counterpart of the Independent-Samples T-Test)

  • Also known as:
    • Mann-Whitney U
    • Wilcoxon rank-sum Ws
  • Example: Drug Use and Depression
    • Research question: Is there a difference in the experiences of depression in people who use different recreational drugs when clubbing?
    • Procedure: Drugs were taken on a Saturday night while clubbing. Depression scores were measured twice, on Sunday night and then again on Wednesday.
    • DV: depression (BDI) measured 1 day after drug use (Sunday)
    • IV: Drug (alcohol vs. ecstasy), between-subjects categorical variable
    • Design: Single-IV between-groups design with two levels
  • Because:
    • Small sample size
    • Outliers
    • Non-normal data
    • Not-so-equal variances
    • So, we use the Rank Sum test instead
    • Stata syntax: ranksum DV, by (IV)
      • egen SunBDIRank = rank(SundayBDI)
      • H0: ranks (median) between the two groups didn’t differ
      • If p < .05 reject H0, indicating stat. sig.; Two p-values here; report the exact p.
  • Effect size:
    • r = \frac{z}{\sqrt{N}}
    • Sunday Depression scores: r = \frac{1.105}{\sqrt{20}} = 0.25
    • Same rules of thumb as correlation (Cohen, 1988):
      • Between 0.1 and 0.3 = small effect
      • Between 0.3 and 0.5 = medium effect
      • > 0.5 = large effect
    • Conclude: The day after a night of clubbing, depression scores do not significantly differ between those who drank alcohol (mean rank = 9.1) and those who took ecstasy (mean rank = 12.0), z = -1.11, p = .288, with a weak effect size, r = .25.

Wilcoxon Sign Rank Test (Two Related Groups; Counterpart of the Paired-Samples T-Test)

  • Not to be confused with the Wilcoxon rank-sum test
  • Example: Drug Use and Depression
    • Research question: Have people’s experiences with depression changed after one day of using drugs while clubbing, compared to four days later? But here, let’s focus on Alcohol.
    • Procedure: Drugs were taken on a Saturday night while clubbing. Depression scores were measured twice, on Sunday night and then again on Wednesday.
    • DV: depression (BDI)
    • IV: Day (1 day after alcohol use, Sunday; 4 days after alcohol use), within-subjects categorical variable
    • Design: A within-group design with a single factor comprising two levels
  • Normality of the differences (between the conditions/levels)
  • Not normal difference scores,Small N (10 pairs of scores)
  • Stata syntax: signrank var1 = var2
    • Sort diff_rank Ignoring the signs; Compare the ranks of the diff. associated with + vs. - sign. H0: No changes in the ranks of differences. Rank the absolute diff.
    • Report the exact p.
  • Effect size:
    • r = \frac{z}{\sqrt{N}}
    • Pairs of obs.
    • r = \frac{1.99}{\sqrt{10}} = 0.63 (large effect)
    • When reporting, can include the median scores from each variable, or the sum of the positive and negative ranks
    • Conclude: A sign rank test demonstrated that, after consuming alcohol and clubbing on Saturday night, depression scores significantly decreased from Sunday (Median = 16) to Wednesday (Median = 7.5), z = -1.99, p = .045, with a large effect size, r = .63

Kruskal-Wallis Test (Multiple Independent Groups; Counterpart of the Between-Subjects One-Way ANOVA)

  • Used for violations of distributional assumptions, especially if coupled with small sample sizes and/or uneven group sizes
  • Extension of Previous Study:
    • Let’s say someone wanted to extend the previous study, rather than just looking at ecstasy vs alcohol, they wanted to explore ecstasy, alcohol, and water-only drinkers.
    • 100 people were recruited: 10 were ecstasy users, 60 were alcohol drinkers, and 30 were water-only drinkers
    • Dataset: “drug3groups.dta”, with a categorical variable “Drug” and a numerical variable “SundayBDI”
  • kwallis DV, by(IV)
    • Like the Rank Sum test, it ranks the DV and then totals the ranks for different groups
    • test statistic: χ2
    • p < .05 reject H0, indicating there’s some difference in the mean ranks among the groups
    • Need to follow up with group comparisons!
    • H0: Ranks (median) among groups are the same
  • We use Dunn’s test of multiple comparisons to follow up on a statistically significant Kruskal-Wallis test:
    • Syntax: dunntest DV, by(IV) ma(bonferroni)
    • It conducts pair-wise comparison, just like running a Rank-sum test on each pair of groups
    • Need to adjust family-wise error rate using different adjustments to prevent inflation of Type I error (like ANOVA)
  • Effect Size
    • Calculating the r effect size
      • r = \frac{z}{\sqrt{N}}
      • N for the two groups being compared
    • For example, between Ecstasy and alcohol: r = \frac{1.09}{\sqrt{70}} = 0.13
    • There’s no significant difference in depression scores between alcohol and ecstasy, z = 1.09, p = .413, with a small effect size, r = .13.
    • There is a significant difference between ecstasy and water drinkers, z = 4.70, p < .001, r = .74, and alcohol and water drinkers, z = 6.00, p < .001, r = .63, in that the mean ranks of depression were lower in water-only drinkers than both other groups, both with large sized differences.

Final Remarks

  • There is also non-parametric equivalent to one-way repeated measures ANOVA: Friedman’s Test. However, the test itself has yet to be elegantly implemented in Stata
  • Ranking loses information on the nuances of the scale and differences
  • If the assumptions are met, yes, parametric analyses are more powerful (more likely to detect an effect)
  • But if assumptions are particularly badly violated, non-parametric tests are often more powerful, and provide us more reliable p-value.
  • Often, you’ll get the same results/conclusion from equivalent parametric and non-parametric tests (which doesn’t mean to try both and pick the preferred outcome!).
  • These non-parametric analyses are pretty simple, with no equivalent for more complex designs
  • Ways of adjusting estimations within the parametric analyses (e.g. bootstrapping)
  • Ways to transform the data
  • More complex computational estimations (e.g. simulations)
  • Again, if and when we can (which is the majority of the time), parametric tests (t-tests, regression, ANOVAs) are better choices for analysing our data; however, it is useful to know when to use non-parametric tests and why, sometimes, people use them

Conclusions

  • Non-parametric analyses don’t have expectations about shapes of distributions, because they:
    • Apply to categorical data, and therefore, distributional shapes are irrelevant! or
    • Apply to non-normal or ordinal data, and rank the data before analysing
  • When it comes to numeric data that doesn’t meet distributional assumptions (or violations to distributional assumptions are coupled with other issues, e.g. small sample sizes or unequal groups), non-parametric analyses are an option
  • Despite being different families of analyses and different methods of analysing, similar practice:
    • Understand the variables and the research question
    • Numerically and graphically describe the data
    • Conduct the analysis, and if necessary, make adjustments for multiple comparisons.
    • Interpret the results.
  • After this week’s lecture, you know:
    • What defines non-parametric analyses, and when they are appropriate to use
    • Spearman’s correlation, rank sum test, sign rank test, Kruskal-Wallis test
    • How to interpret the results of these tests
    • The strengths and limitations of non-parametric tests
  • In Stata, you should be able to:
    • Conduct the tests covered
    • Graph data in an appropriate way for the type of data (and its distribution)