W.5 Non-Parametric Tests

Overview of Nonparametric Tests

  • Definition: Nonparametric tests make fewer assumptions about the data compared to parametric tests. They can be used for ordinal data and do not rely on the assumption of normal distribution.

  • Why Use Nonparametric Tests:

    • When data is not normally distributed

    • When sample sizes are small

    • When the measurement scale is not interval (e.g., ordinal or nominal).

Key Concepts

  • Parametric vs. Nonparametric:

    • Parametric tests assume normal distribution and require data to be on an interval scale.

    • Nonparametric tests do not require normality, making them applicable in more situations, especially with ordinal data.

  • Normal Distribution:

    • Essential for parametric tests; many datasets can appear normally distributed with enough data points.

  • Ranking:

    • Ranks: The basic operation in nonparametric tests. Data is ordered from smallest to largest:

      1. Smallest value receives rank 1, the next smallest gets rank 2, and so on.

      2. For tied ranks, the averages of the ranks are assigned.

Nonparametric Tests Introduced

1. Wilcoxon Rank Sum Test (Mann-Whitney U Test)
  • Purpose: To compare differences between two independent groups on an ordinal or continuous dependent variable.

  • Null Hypothesis (H0): Assumes there is no difference in the distributions of the two groups.

  • Process:

    • Combine the two groups and rank all observations.

    • Calculate the sum of ranks for each group.

    • Compute a z-score based on the ranks and expected values.

  • Formula:
    Z=R<em>jE(R</em>j)σ(Rj)Z = \frac{R<em>j - E(R</em>j)}{\sigma(R_j)}

    • $R_j$: sum of ranks for group j

    • $E(R_j)$: expected rank sum

    • $\sigma(R_j)$: standard deviation of rank sums

2. Wilcoxon Signed Rank Test
  • Purpose: For comparing two related samples, matched samples, or repeated measurements.

  • Process:

    • Calculate differences between paired observations.

    • Rank these differences, ignoring signs.

    • Calculate the test statistic based on the ranks for positive and negative changes.

  • Null Hypothesis (H0): Assumes no median difference between pairs.

  • Tests: Compare minimum of positive and negative rank sums.

3. Chi-Square Test
  • Purpose: To test the association between two categorical variables (nominal data).

  • Observed vs Expected Frequencies: Determines how far the observed data diverges from expected data based on the null hypothesis.

  • Process:

    • Count occurrences in a contingency table.

    • Calculate expected frequencies using the formula:
      Eij=(row total)×(column total)(overall total)E_{ij} = \frac{(row \ total) \times (column \ total)}{(overall \ total)}

    • Compute the Chi-square statistic:
      χ2=(O<em>ijE</em>ij)2Eij\chi^2 = \sum \frac{(O<em>{ij} - E</em>{ij})^2}{E_{ij}}

    • Where $O{ij}$ = observed frequency, $E{ij}$ = expected frequency

Application of Nonparametric Tests

  • Use nonparametric tests when dealing with ordinal level data or when the assumptions of parametric tests cannot be met. Examples include analyzing survey results or evaluating non-numeric ranks.

  • Nonparametric tests can be less powerful than parametric tests if the assumptions of parametric tests hold true.

Important Aspects to Remember

  • Choosing the right test: Consider the type of data and sample size when selecting between parametric and nonparametric tests.

  • Z-scores and critical values: Understanding the concept of z-scores is crucial for nonparametric test interpretation.

  • Rank-based methods: The ranking system is key to nonparametric tests, enabling their flexibility across various data types.

Definition: Nonparametric tests are statistical methods that make fewer assumptions about the underlying data compared to parametric tests. They are particularly useful for analyzing ordinal data and do not rely on the assumption of a normal distribution. This flexibility allows nonparametric tests to be applied to a wider range of data types and scenarios.

Why Use Nonparametric Tests:

  • When data is not normally distributed, nonparametric tests can provide reliable results since they do not assume any specific probability distribution.

  • They are especially effective when sample sizes are small, as small samples can lead to inaccurate estimates of parameters needed for parametric tests.

  • When the measurement scale is not interval (e.g., ordinal or nominal), nonparametric tests can still be applied effectively, ensuring that all types of data can be analyzed appropriately.

Key Concepts:

  • Parametric vs. Nonparametric:
    Parametric tests assume that the data follow a normal distribution and require data to be measured on an interval scale. This assumption can limit their application, especially in real-world datasets that may not conform to these criteria. In contrast, nonparametric tests do not require normality and can function effectively with ordinal data, thus expanding their utility across various fields such as psychology, social sciences, and medical research.

  • Normal Distribution:
    Understanding normal distribution is essential for using parametric tests because many statistical methods require this condition to be met for valid results. Nonparametric tests, by virtue of their design, can still generate useful insights even when datasets appear skewed or do not fit the normal distribution curve. Notably, with sufficiently large samples, some datasets can visually approximate normal distribution, complicating the decision regarding which tests to apply.

  • Ranking:
    Ranks serve as the fundamental operation in nonparametric tests. Data are ordered from smallest to largest, with the smallest value receiving rank 1, the next smallest rank 2, and continuing sequentially. For tied ranks (instances where two or more values are the same), the averages of the ranks are computed and assigned to each tied observation. This ranking process allows nonparametric tests to account for the magnitude of differences without being skewed by extreme values.

Nonparametric Tests Introduced:

  1. Wilcoxon Rank Sum Test (Mann-Whitney U Test):

  • Purpose: This test is designed to compare differences between two independent groups with respect to an ordinal or continuous dependent variable.

  • Null Hypothesis (H0): Assumes that there is no difference in the distributions of the two groups.

  • Process:

    1. Combine the two groups and rank all observations collectively.

    2. Calculate the sum of ranks for each group and determine the test statistic.

    3. Compute a z-score based on the ranks and expected values, which provides insight into the significance of the differences observed.

  • Formula:
    Z=R<em>jE(R</em>j)σ(R<em>j)Z = \frac{R<em>j - E(R</em>j)}{\sigma(R<em>j)} where $Rj$ is the sum of ranks for group j, $E(Rj)$ is the expected rank sum based on the sample sizes, and $\sigma(Rj)$ is the standard deviation of rank sums.

  1. Wilcoxon Signed Rank Test:

  • Purpose: This test is applied for comparing two related samples, matched samples, or repeated measurements from the same sample subject to assess whether their population mean ranks differ.

  • Process:

    1. Calculate differences between paired observations, taking care to record the signs of these differences.

    2. Rank the absolute values of these differences, ignoring their signs.

    3. Calculate the test statistic based on the ranks for both positive and negative changes to analyze the overall trend in paired data.

  • Null Hypothesis (H0): Assumes that there is no median difference between the pairs being compared.

  • The outcomes are often interpreted by looking at the rankings of positive and negative changes to assess the direction and significance of the differences.

  1. Chi-Square Test:

  • Purpose: The Chi-Square test is used to examine the association between two or more categorical variables (nominal data). It is widely applicable in fields like social sciences, medicine, and market research.

  • Observed vs Expected Frequencies: This method assesses how much the observed data diverges from the expected data based on the null hypothesis.

  • Process:

    1. Count occurrences of variables in a contingency table to summarize the data.

    2. Calculate expected frequencies using the formula:
      Eij=(row total)×(column total)(overall total)E_{ij} = \frac{(row \ total) \times (column \ total)}{(overall \ total)}

    3. Compute the Chi-square statistic to evaluate the differences between observed and expected frequencies:
      χ2=(O<em>ijE</em>ij)2E<em>ij\chi^2 = \sum \frac{(O<em>{ij} - E</em>{ij})^2}{E<em>{ij}} where $O{ij}$ refers to observed frequency and $E_{ij}$ refers to expected frequency.

Application of Nonparametric Tests:

  • Nonparametric tests are particularly suited for analyzing ordinal level data or when the assumptions of parametric tests cannot be met, such as with small sample sizes or when data is skewed. Examples include analyzing survey results