Nonparametric Methods (Comparing Medians)

17.1 Parametric Tests (z, t)

  • Parametric tests should ideally be presented with the caveat: "If our assumptions concerning the shape of the population distributions are valid, we may conclude that . . . . . . "

  • Parametric tests are thought to be more systematized and easier to apply but rely on assumptions that are not always met.

  • Common characteristics/assumptions of parametric tests:

    • Independence of observations (except for paired data).

    • Observations are randomly drawn from a Normally distributed population.

    • Minimum sample size of approximately 30 if the population is Non-normal.

    • Data are drawn from populations having equal variances.

    • Other possible requirements: nominal, interval or ratio-level measurements, homoscedasticity, equal sample sizes.

17.1 Nonparametric or Distribution-free Tests

  • Alternative tests of statistical inference that do not make numerous or stringent assumptions about the population.

  • These techniques are called distribution-free or nonparametric tests.

  • The distribution of values for the variable may be very skewed or non-normal.

  • Distributions can take on any shape and are not limited to the bell shape of the normal distribution.

  • Common Characteristics/Assumptions of Non-parametric Tests

    • Independence of randomly selected observations except when paired.

    • Few assumptions concerning the population’s distribution.

    • The scale of measurement may be numeric or categorical or ordinal.

    • The primary focus is either the rank ordering of scores or the frequencies or classification of data.

    • Hypotheses are most often posed regarding ranks, medians, or frequencies of data.

    • Sample size requirements are less stringent than parametric tests.

    • Less powerful than parametric tests but has wide applicability.

17.1 Data Types for Nonparametric Tests

  • Numeric Data:

    • One-sample t-test could be used when the sample comes from a normally distributed population.

    • However, when the data is very skewed t-tests do not provide good results, especially when the sample size is small.

    • When the conditions for a t-test are not satisfied, nonparametric tests are used.

  • Ordinal Data:

    • Many measurements are ordered information with no quantitative values or units.

    • Surveys of opinion using categories such as “very satisfied,” “satisfied,” “neither satisfied nor dissatisfied,” “dissatisfied,” or “very dissatisfied” is a prime example of measurements without quantitative units.

    • These scales were first developed by Rensis Likert and are known as Likert scales.

    • For ordinal level measurements, nonparametric tests are used to comparing distributions.

Level of Measurement (Revisited)

  • Level of Measurements: A system of classification with four types of measurement rules that affect the kind of statistical analysis that is appropriate:

    • Nominal

    • Ordinal

    • Interval

    • Ratio

Nominal Measurement:

  • Have Categories only without any inherent ordering.

  • Lowest level of measurement.

  • The term “Nominal” is derived from the Latin nomen, meaning “name”.

  • Example 1: Sex coded with 2 arbitrary numbers. It does not matter what the codes are, the numbers have no quantitative meaning (although codes like 0 and 1 are more sensible).

  • Mathematical operation (addition, subtraction, multiplication, division) may not be meaningful with these numerals. As for instance, in class we have 30 men (all coded “0”) and 70 women (all coded “1”). Average score is 0.7 which means nothing.

  • Example 2: Marital Status coded as: Single = 1, Married = 2, Divorced = 3, and Widowed = 4. These data are categorical in nature, so arithmetic operations do not make any sense (e.g., Does Widowed ÷2 = Married?)

Ordinal Measurement (Likert Scale):

  • Have Categories with inherent ordering.

  • Distances between categories are not equal and not known.

  • Example 1: Degree of Pain: 1 = None, 2 = Some, 3 = A lot (Pain of 1.7 means nothing).

  • Example 2: Satisfaction, social class, depression.

Interval Measurement:

  • Have Categories with inherent ordering.

  • Distances between categories are assumed equal.

  • There is NO meaningful ZERO point i.e., ZERO value in this scale does not mean null.

  • Example: Temperature (Celsius): The difference between 20 and 25 degrees is the same as the difference between 25 and 30 degrees. Does 0 degree Celsius mean no temperature? (00C = 320F).

Ratio Measurement:

  • Have Categories with inherent ordering.

  • Distances between categories are assumed equal.

  • ZERO point is meaningful i.e., ZERO value in this scale means null.

  • Example 1: Medication Dose: Number of milligrams, number of pills. 0mg means no medication.

  • Example 2: Money ($0 means no money i.e., ZERO means null).

Learning Check

  • What level of measurement is a variable that has information on?

    • (A) Religion (Nominal)

    • (B) Exam Scores (Ratio)

    • (C) Years of Schooling (Ratio)

    • (D) Hair Color (Nominal)

    • (E) Level of Political Conservatism (Ordinal)

    • (F) Days of School Attendance (Ratio)

    • (G) Number of Children (Ratio)

    • (H) Attitude Towards Current Government (Support, Indifferent, Do Not Support) (Nominal)

17.2 Wilcoxon Signed-Rank Test

  • The Wilcoxon Signed-Rank Test is designed for comparing paired data and it is based on ranks.

  • This test is a non-parametric equivalent of two-samples paired t-test.

Step 1: Hypotheses:

  • When the assumptions are satisfied, we can test whether the median Md of the differences between the two populations D1 and D2 is zero.

    • (i) Two-tailed Test

      • H0 : Md = 0

      • H0 : Md ̸= 0

    • (ii) Right-tailed Test

      • H0 : Md = 0

      • H0 : Md > 0

    • (ii) Left-tailed Test

      • H0 : Md = 0

      • H0 : Md < 0

Step 2: Calculation of Ranks (EXCEL or MINITAB):

  • Given two matched pairs of n observations, selected at random from populations 1 and 2 with distributions D1 and D2 and compute the n differences (D1 − D2).

  • Rank absolute differences from smallest to largest.

  • Drop zero differences from sample.

  • Assign average ranks for ties.

  • Find sum of ranks of negative differences, T − .

  • Find sum of ranks of positive differences, T + .

Step 3:Test Statistic

  • T = Smaller of T + and T − .

Step 4: Critical Value:

  • Critical values T0 for a one or two-tailed rejection region can be found using Table W2 in Appendix B-15.

Step 5: Decision:

  • If the two populations are the same, T + and T − should be nearly equal.

  • If either T + or T − is unusually large, this provides evidence against the null hypothesis.

  • Assumptions and Conditions:

    • Independence Assumption: The data are appropriately paired and the subjects are randomly selected.

    • Symmetry Assumption: The distributions of the paired data are symmetric.

    • When to Use? The test is appropriate when the Normality assumption of parametric statistics don’t apply and particularly when the sample size is small.

    • The Wilcoxon Signed-Rank Test is used for paired data. However, the test can also be used when we have a single sample of data

  • Example:

    • To compare the densities of cakes using mixes A and B, six pairs of pans (A and B) were baked side-by-side in six different oven locations. Is there evidence of a difference in density for the two cake mixes? Use \alpha = .05.

    • Hypotheses:

      • H0 : The density distributions are the same i.e.,Md = 0

      • HA : The density distributions are not the same i.e.,Md ̸= 0

17.4 Wilcoxon Rank-Sum Test (or Mann-Whitney U Test)

Objective:

  • This test is used to perform the hypothesis that two independent distributions have the same center.

Non-parametric Test:

  • A non-parametric equivalent of two-samples independent t test.

Logic:

  • We select two independent random samples from each population. Designate each of the observations from population 1 as an “A” and each of the observations from population 2 as a “B”.

  • What happens if H0 is true? Suppose we had 5 measurements from population 1 and 6 measurements from population 2. If they were drawn from the same population, the rankings might be like this (i.e., no specific pattern). In this case if we summed the ranks of the A measurements and the ranks of the B measurements, the sums would be similar.

  • ABABBABABBA

  • What happens if H0 is NOT true? If the observations come from two different populations, perhaps with population 1 lying to the left of population 2, the ranking of the observations might take the following ordering.

  • AAABABABBB

  • In this case the sum of the ranks of the B observations would be larger than that for the A observations.

Step 1: Steps of Hypotheses:

  • Two-tailed Test:

    • H0 : The two samples are from the same distribution

    • HA : The distributions for population 1 and 2 are different.

  • Right-tailed Test:

    • H0 : The two samples are from the same distribution

    • HA : The distribution for population 1 lies to the right of that for population 2.

  • Left-tailed Test:

    • H0 : The two samples are from the same distribution

    • HA : The distribution for population 1 lies to the left of that for population 2.

Steps of Hypotheses:

  • Step 2: Calculation of Ranks:

    • Rank the combined sample from smallest to largest. Tied ranks are given the same rank: the average of the potential (or initial) ranks.

    • Let T1 represent the sum of the ranks of the first sample whose sample size n1. Then, T2 is the sum of the ranks of the other sample whose sample size is n2 which can also be obtained using the following formula: T2 = \frac{n(n + 1)}{2} − T1

  • Step 3: Test Statistic:

    • For Unequal Sample Size:

      • W = Sum of ranks in the group that contains the least people (i.e., lower sample size).

    • For Equal Sample Size:

      • W = The value of the smaller summed rank (left-tailed test), the value of the larger summed rank (right-tailed test), The value of either summed rank (two-tailed test).

  • Step 4: Critical Value:

    • Critical values TL,TU for a one or two-tailed rejection region can be found using Table W1 in Appendix B-14.

  • Step 5: Decision (Critical Value Approach)

    • Reject H0 if

      • (i) W < TL or W > TU (Two-tailed test)

      • (ii) Left-tailed test (D1 is shifted to the left of D2)

        • (a) W < TL if n1 ≤ n2

        • (b) W > TU if n1 > n2

      • (iii) Right-tailed test (D1 is shifted to the right of D2)

        • (a) W > TU if n1 ≤ n2

        • (b) W < TL if n1 > n2

    • (Here, n1 and n2 represent the sample sizes of distributions D1 and D2 respectively)

  • Step 5: Decision (p-value Approach)

    • (i) If p − value < \alpha, reject H0

    • (ii) p − value ≥ \alpha, do not reject H0.

  • Assumptions and Conditions:

    • Independence Assumption: The data values are mutually independent. Appropriate randomization in data collection is one way to accomplish this.

    • Independent Groups Assumption: The two groups must be independent of each other.

    • When to Use?

      • Use when data consists of grades or ordered categories.

      • These tests can also be used to replace numeric data with their ranks when there is a possibility of extreme outliers, which would affect other tests.

      • With quantitative data the Wilcoxon/Mann-Whitney only has about 95% of the power of a corresponding two sample t-test.

      • Outliers are interesting in their own right and ignoring them may result in missing an important fact or instance.

  • Example:

    • The wing stroke frequencies of two species of bees were recorded for a sample of n1 = 4 from species 1 and n2 = 6 from species 2. Can you conclude that the distributions of wing strokes differ for these two species? Use \alpha = .05.

    • Hypotheses:

      • H0 : The distribution of the two species are same

      • HA : The distribution of the two species are in some way different.

  • Large Sample Approximation (Sample Sizes larger than 10))

    • Calculate mean and variance of Ti(i = 1, 2)

      • Mean, E(Ti) = \frac{ni(n1 + n2 + 1)}{2}

      • Variance, V(Ti) = \frac{n1n2(n1 + n2 + 1)}{12}

    • Test statistic z = \frac{W − E(Ti)}{√V(Ti)}

    • Decision Rule (Critical Value Approach):

      • For a two-tailed test, reject H0 if z < −z \frac{\alpha}{2} or z > z \frac{\alpha}{2}.

      • For a left-tailed test, reject H0 if z < −zα.

      • For a right-tailed test, reject H0 if z > zα.

    • Decision Rule (p-value Approach):

      • If p − value < \alpha, reject H0

      • p − value ≥ \alpha, do not reject H0.

17.9 When Should You Use Nonparametric Methods?

  • Nonparametric methods are particularly valuable when your data contain only information about order.

  • They are useful with quantitative data when the variables violate one or more of our assumptions and conditions.

  • Transferring quantitative values to ranks protects us from the influence of outliers, multi-modal distributions, skewed distributions, and nonlinear relationships.

  • Be careful using nonparametric methods when you don’t have to since they are less powerful than corresponding parametric methods.

  • Also features such as outliers, multiple modes, skews, and nonlinear relationships are important features of the data, and should not be ignored.