PSY2041: Quantitative Methods IV (Week 9) - Non-Parametric Statistics

Week 9: Quantitative Methods IV - Non-Parametric Statistics

This week's lectures, presented by Daniel Bennett for PSY2041 (Semester 2, 2025), focus on quantitative methods, specifically introducing non-parametric statistical tests. The unit assumes a foundational understanding of statistical concepts and aims to prepare students for advanced data analysis.

Lecture Learning Outcomes

Upon completion of this week's material, students should be able to:

Describe the difference between parametric and non-parametric statistical tests and explain when non-parametric tests are suitable.
Describe the Spearman correlation analysis at a conceptual level and identify research questions that require this test.
Describe the Wilcoxon signed-rank test at a conceptual level and identify research questions that require this test.
Describe the Wilcoxon rank-sum test at a conceptual level and identify research questions that require this test.

Overview of Non-Parametric Statistics

Recap: Parametric and Non-Parametric Statistics

In Null Hypothesis Significance Testing (NHST), we initially assume the null hypothesis is true and then assess how surprising our sample data are under this assumption. The validity of the calculated p-value often hinges on certain assumptions about the data. A common assumption for many statistical tests, particularly parametric ones, is that the data are normally distributed.

However, if data are not normally distributed, parametric tests may not be appropriate. This is where non-parametric statistical tests become crucial. Non-parametric tests are designed to be used when data do not meet the strict distributional assumptions (like normality) required by parametric tests.

Examples of Data Distribution

Normally Distributed Data: An example is extraversion scores in the general population, which often exhibit a bell-shaped, symmetrical distribution around a central mean.
- Visual check: A histogram of extraversion scores would show a peak in the middle and tails tapering off symmetrically.
Non-Normally Distributed Data:
- Positively Skewed: Anxiety symptoms in the general population are typically positively skewed, meaning there are more people with lower anxiety levels and fewer with very high levels. The tail of the distribution extends to the right.
- Uniform Distribution: The outcomes of rolling a 6-sided dice result in a uniform distribution, where each outcome (1, 2, 3, 4, 5, 6) has an equal probability of occurring. This is distinctly non-normal, as there is no central peak.

Testing for Normality

Before deciding between parametric and non-parametric tests, it's essential to assess the normality of the data. Two common methods are:

Quantile-Quantile (Q-Q) Plots:
- These plots compare the observed data (y-axis) against what would be expected if the data were perfectly normally distributed (x-axis).
- If data are normally distributed, the points on the Q-Q plot will mostly fall along a diagonal straight line.
- If data are non-normally distributed, the points will deviate significantly from this diagonal line, indicating skewness or other departures from normality.
Shapiro-Wilk Test:
- The Shapiro-Wilk test is a formal statistical test for normality.
- Null Hypothesis (H0): The data are normally distributed.
- Alternative Hypothesis (H1): The data are not normally distributed.
- A significant p-value (e.g., p < 0.05) from a Shapiro-Wilk test indicates that the data are statistically significantly different from a normal distribution, leading to the rejection of the null hypothesis and concluding that the data are non-normally distributed. Conversely, a non-significant p-value (e.g., p = 0.99) suggests the data do not significantly deviate from normality.

Advantages and Disadvantages of Non-Parametric Statistics

Many parametric statistical tests have a direct non-parametric counterpart:

Parametric Statistical Test	Non-Parametric Statistical Test
Pearson correlation	Spearman correlation
One-sample t-test	Wilcoxon signed-rank test
Paired-samples t-test	Wilcoxon signed-rank test
Independent-samples t-test	Wilcoxon rank-sum test

Advantages: The primary advantage of non-parametric statistics is that they make fewer or no assumptions about the distribution of the data. This allows them to be used in a wider range of situations, especially when data are skewed, have outliers, or are measured on an ordinal scale.
Disadvantages: The main disadvantage is that non-parametric tests generally have less statistical power than their parametric counterparts. This means that parametric tests are more likely (when their assumptions are met) to detect a true effect if one exists (i.e., reject a false null hypothesis).
Preference: Parametric tests are generally preferred over non-parametric tests as long as their underlying assumptions are met, due to their higher statistical power.

The Spearman Correlation Analysis

Conceptual Description and Suitable Research Questions

The Spearman correlation analysis is the non-parametric equivalent of the Pearson correlation. Unlike the Pearson correlation, it does not assume that both variables are normally distributed. It is suitable for research questions that examine the monotonic relationship between two variables when the data may not meet the normality assumption for Pearson correlation, or when the relationship is not necessarily linear but consistently increasing or decreasing.

Assumptions of Significance Testing for Pearson Correlation

Before computing a p-value for a Pearson correlation, several assumptions are typically made:

Both variables (X and Y) are approximately normally distributed.
Both variables are measured on an interval or ratio scale.
There are no significant outliers that unduly influence the correlation.
The association between X and Y is linear.

If these assumptions are violated, a non-parametric correlation like Spearman's
ho should be considered.

The Spearman Rank Correlation Coefficient (

ho)
To calculate a Spearman correlation, the raw data for both variables are transformed into ranks:

Ranking: For each variable, the lowest value is assigned a rank of 1, the second-lowest rank 2, and so on.
Tied Ranks: If two or more values are identical (tied), they receive the average of the ranks they would have occupied.
Pearson on Ranks: A standard Pearson correlation coefficient is then computed on these rank-transformed data.

The resulting correlation coefficient, often represented by the Greek letter
ho, ranges from -1 to 1 and is interpreted similarly to the Pearson coefficient: a value near 1 indicates a strong positive monotonic relationship, near -1 a strong negative monotonic relationship, and near 0 no monotonic relationship.

Why Rank Transformation Works

Rank-transforming data addresses several issues that can arise when data fail to meet the parametric assumptions for Pearson correlation:

Nonlinear Monotonic Relationships: If the relationship between X and Y is consistently increasing or decreasing but not strictly linear (e.g., an exponential curve), ranking can capture this. For example, a perfect monotonic relationship will yield a Spearman
ho = 1, even if it's non-linear.
Non-Normally Distributed Data: Rank transformation equalizes the spreads of different distributions and removes the impact of skewness, making the subsequent Pearson calculation on ranks more robust. For instance, even with non-normal data, a strong monotonic relationship might result in a Spearman
ho = 0.72.
Outliers: Outliers can heavily influence Pearson correlations, pulling the regression line and inflating/deflating the coefficient. When data are ranked, the extreme values still get the highest/lowest ranks, but their magnitude of extremeness is reduced to simply being