Nonparametric Hypothesis Testing: Wilcoxon Signed Rank and Rank Sum Tests

Wilcoxon Signed Rank Test and Wilcoxon Rank Sum Test: Nonparametric Equivalents of T-tests

This lecture introduces the Wilcoxon Signed Rank Test and the Wilcoxon Rank Sum Test, which serve as the nonparametric equivalents to the various types of t-tests covered previously in quantitative methods. These tests are crucial when the assumption of normally distributed data, required by t-tests, cannot be met. The concepts for both will be discussed separately for clarity.

The Wilcoxon Signed Rank Test

The Wilcoxon Signed Rank Test is the nonparametric equivalent of both a one-sample t-test and a paired-sample t-test.

Comparison to One-Sample T-test

A one-sample t-test is used to compare the mean of sample data to a specific, known reference value (e.g., comparing the average IQ of PSY2041 students to the general population's average IQ of 100). Similarly, the Wilcoxon Signed Rank Test performs this comparison, but without assuming data are normally distributed.

Comparison to Paired-Sample T-test

Recall that a paired-sample t-test essentially involves calculating the differences between two measurements for each subject (e.g., pre-test vs. post-test scores) and then performing a one-sample t-test on these difference scores against a reference value of 0. Because the Wilcoxon Signed Rank Test is the nonparametric equivalent of the one-sample t-test, it logically also serves as the nonparametric equivalent of the paired-sample t-test, applied to the difference scores.

Key Differences from T-tests and Assumptions

Distribution Assumption: Unlike one-sample and paired-sample t-tests, the Wilcoxon Signed Rank Test does not assume that data (or the differences, for paired samples) are normally distributed. The point about knowing the population standard deviation also becomes irrelevant.
Scale of Measurement: Both the Wilcoxon Signed Rank Test and the t-tests discussed assume that data are measured on an interval or a ratio scale. They cannot be used for nominal or ordinal scale data.
Measure of Central Tendency: The Wilcoxon Signed Rank Test compares the median of the data to a reference value, whereas the t-tests compare the mean. While both mean and median are measures of central tendency, this is a technical distinction to be aware of.

Core Idea of the Wilcoxon Signed Rank Test

The fundamental concept is relatively simple:

If a sample's measure of central tendency (median) is different from a reference value, it implies a predictable pattern in the deviations from that reference value.
Higher Median: If the distribution's center is higher than the reference value, we expect to see many large positive differences (observations above the reference) and few large negative differences.
Lower Median: Conversely, if the distribution's center is lower, we expect many large negative differences and few large positive differences.
The test's goal is to determine if the observed imbalance between positive and negative differences is significant enough to reject the null hypothesis (that the sample median is equal to the reference value). This approach avoids assumptions about normal distributions.

Procedure: Computing Signed Ranks

Calculate Differences: For each observation, compute the difference between the observed value and the reference value.
Handle Zero Differences: Observations with a difference of 0 are typically excluded from the ranking process.
Rank Absolute Differences: Rank the absolute values of these non-zero differences from smallest (1) to largest.
Assign Signed Ranks: Reapply the original sign (positive or negative) to each rank. For tied absolute differences, assign the average of the ranks they would have received.
Sum Signed Ranks: Separately sum all the positive ranks and all the negative ranks.
Comparison: The test statistically evaluates whether the magnitude of the positive rank sum and negative rank sum indicates a significant departure from the null hypothesis.

Example: IQ Data

Consider IQ data from a sample (N=10): 86, 97, 99, 100, 106, 108, 109, 112, 113, 113. We want to know if the population median from which this sample is drawn differs from the general population IQ median of 100.

Differences from 100: -14, -3, -1, 0, 6, 8, 9, 12, 13, 13 (The 0 difference for 100 is excluded from ranking).
Absolute Differences (non-zero): 1, 3, 6, 8, 9, 12, 13, 13, 14
Ranks of Absolute Differences:
- 1
  ightarrow 1
- 3
  ightarrow 2
- 6
  ightarrow 3
- 8
  ightarrow 4
- 9
  ightarrow 5
- 12
  ightarrow 6
- 13, 13
  ightarrow (7+8)/2 = 7.5 (average of ranks 7 and 8)
- 14
  ightarrow 9
- (Note: The provided lecture slide numbers might differ slightly in how tied ranks are presented, but the principle of averaging ranks is standard.)
Signed Ranks: Applying signs to the ranks based on original differences:
- -14
  ightarrow -9
- -3
  ightarrow -2
- -1
  ightarrow -1
- 6
  ightarrow +3
- 8
  ightarrow +4
- 9
  ightarrow +5
- 12
  ightarrow +6
- 13
  ightarrow +7.5
- 13
  ightarrow +7.5
Sum of Positive Ranks: 3 + 4 + 5 + 6 + 7.5 + 7.5 = 33 (The lecture example's sum of positive ranks was 37, implying slightly different specific ranks were used in their demonstration, possibly due to a blanked-out value or a different ranking system shown).
Sum of Negative Ranks: -9 + -2 + -1 = -12 (The lecture example's sum of negative ranks was -17).
Conclusion: In the lecture's example, the calculation yielded a positive rank sum of 37 and a negative rank sum of -17. The associated p-value was 0.34. Since p > 0.05, we cannot reject the null hypothesis. This means there is insufficient evidence to conclude that the population median IQ for PSY2041 students is different from 100. This aligns with observing 6 scores above 100, 3 below, and 1 equal to 100.

The Wilcoxon Rank Sum Test (Mann-Whitney U Test)

The Wilcoxon Rank Sum Test is the nonparametric equivalent of an independent samples t-test. It is also often referred to as the Mann-Whitney U Test (e.g., in software like JASP).

Purpose and Comparison to Independent Samples T-test

An independent samples t-test compares the means of two separate (independent) samples. The Wilcoxon Rank Sum Test serves the same purpose but does not assume normal distribution of data within each group.

Key Differences from T-tests and Assumptions

Distribution Assumption: Unlike the independent samples t-test, the Wilcoxon Rank Sum Test does not assume that data are normally distributed within each of the two groups.
Scale of Measurement: A key distinction from the Wilcoxon Signed Rank Test is that the Wilcoxon Rank Sum Test can be used on ordinal data, as well as interval or ratio scale data. It is only unsuitable for nominal data.
Measure of Central Tendency: Like the Signed Rank Test, it compares the medians of the two independent groups, rather than their means.

Core Idea of the Wilcoxon Rank Sum Test

The underlying concept is that if two independent samples come from populations with the same median (i.e., the null hypothesis is true), then if all the data from both samples are combined and ranked, the average (or sum) of the ranks for each group should be approximately equal.

If one group consistently has higher scores than the other, when all data are combined and ranked, the data from the higher-scoring group will consistently receive higher ranks.
Conversely, the lower-scoring group's data will receive lower ranks.
A significant difference between the sum of ranks for each group indicates that the null hypothesis (equal medians) can be rejected. This method allows comparison without parametric assumptions.

Procedure

Combine Data: Pool all observations from both independent samples into a single dataset.
Rank Combined Data: Rank all the combined observations from the smallest (1) to the largest. For tied values, assign the average of the ranks they would have received.
Sum Ranks by Group: Separate the combined ranks back into their original groups and calculate the sum of ranks for each sample.
Compare Summed Ranks: The test then assesses whether the difference between these two sums of ranks is statistically significant.

Example: Student Satisfaction with Lectures

Consider comparing student satisfaction (a numerical rating, potentially non-normally distributed) with non-statistics lectures versus statistics lectures.

Imagine 12 students rate non-statistics lectures and 12 other students rate statistics lectures.
If students generally rate non-statistics lectures much higher, then when all 24 satisfaction scores are combined and ranked, the non-statistics scores will tend to occupy the higher ranks, and the statistics scores will occupy the lower ranks (except for potential outliers).
Consequently, the sum of ranks for the non-statistics group would be significantly higher than for the statistics group.
Conducting a Wilcoxon Rank Sum Test on these data would likely yield a very low p-value, leading to the rejection of the null hypothesis. This would suggest a significant difference in median satisfaction levels between non-statistics and statistics lectures.

Conclusion

The Wilcoxon Signed Rank Test and the Wilcoxon Rank Sum Test are valuable nonparametric alternatives to t-tests, particularly when the assumption of normally distributed data cannot be met. They allow for hypothesis testing regarding medians, using rank-based methods that are robust to distributional shape. This content concludes the quantitative methods lectures for the semester. Students are encouraged to provide feedback on the unit via SETU surveys or direct email, as PSY2041 is a new unit and feedback is highly valuable for its improvement.