Week 5 - Nonparametric Tests

Week 5: Nonparametric Tests

Introduction

Assignment Feedback: Tutors may take around three weeks to grade assignments due to being overworked.
Instructor Introduction: Mark Shearer is lecturing on nonparametric tests, a topic he designed.
Previous Lectures: Weeks 1-4 covered introductory material, picking up from Psyc 01/2023, and primarily used parametric tests.
This Week's Focus: Introduction to nonparametric tests.

Dates and Quiz Information

Important Note: Dates mentioned in the lecture slides are incorrect. Week 6 is correlation and regression. Week 7 is research week. Pay attention to announcements.
Week 6: Correlation and Regression (contrary to slide information).
Week 7: Research Week
- No lectures or tutorials.
- Midterm quiz opens during the regular lecture hours and is open for two hours (16:30 - 18:30).
- The quiz is designed to be completed in one hour; students can start as late as 17:30.
Week 8: Midterm Recess (no lecture on April 23 as shown in slides).
Weeks 9-13: Professor Andrew Zemit Mignon (Math Dept.) will cover ANOVA and related topics.

Why Nonparametric Tests?

Nonparametric tests are used when parametric tests are unsuitable.
Fundamental Requirement for Parametric Tests: Data must be on an interval scale (metric scale).
- Equal distance between points (e.g., 1 to 2 is the same distance as 2 to 3).
Non-metric Data: Situations such as questionnaire scores or IQ tests.
- The difference between scores may not be consistent across the scale.
- Example: Difference between test scores of 98 and 99 may not be the same as the difference between 89 and 90
Ordinal Data: Data can be sorted, but intervals are not necessarily equal.
Using nonparametric tests can provide a cleaner approach.

Normal Distribution

The assumption of normal distribution is fundamental for parametric tests to work.
Although it is a big assumption in theory, many real-world phenomena do follow a normal distribution.
Histogram: A good way to visualize the distribution of data.

Example: Reading Level Usage

A study tracked how long primary school children spent on reading level one books (in minutes) before progressing.
The histogram showed a roughly normal distribution.
n = 761 students.
Mean = 1768.21 minutes (29.5 hours).
Standard \$ deviation = 753.55.
Reading time in minutes can be considered an interval scale.

Limitations of Normal Distribution

Questionnaire Scores: Scores ranging from 10 to 20. Is the difference between 15 and 14 the same as 14 and 13? Not necessarily. May not be nicely interval scaled.
Non-normal Distribution: If the histogram of data doesn't look normally distributed, a t-test may not be appropriate.
Normal distribution can be described by mean and standard deviation.
Distributions that deviate significantly from normal are not well characterized by just mean and standard deviation.
- Bimodal
- Skewed
- Ceiling effect

Problems with Violating Normality Assumption?

Z-scores and t-tests rely on accurate approximations of probabilities at the tails of the distribution.

Welcome to Nonparametric Tests

Do not require data to be normally distributed.
Work well with small sample sizes (e.g., 10 or 20).
Can provide p-values.
Potentially slightly less sensitive than parametric tests, particularly with small sample sizes.. However, this difference in sensitivity is not large.
Use ranks the fundamental trick.

Ranks

Ranking Procedure:
- Sort the values.
- The smallest value gets the lowest rank (rank = 1).
- If there are tied scores, you average the ranks.

Example

Scores: 100, 103, 109, 91, 100
Ranked: 2.5, 4, 5, 1, 2.5
91 is the smallest, so it gets rank 1.
Two scores of 100 are the 2nd and 3rd smallest; (2+3)/2 = 2.5.

Why Rank?

Evens out distances between values when the assumption of interval scale is not met
Interval scale assumes equal distance between values, which may not be correct in questionnaire data.
A data set can be uneven with the distance between 90 and 91 not the same between 100 and 101.
Ranks make distances equal, providing an even distribution
Adjusts for unlikely population measurements and when the interval properties are unclear.
Data must be ordinal.
Population distribution: Often, sample data is the only means to estimate population.

Quick Repeat of Technicalities

Average tied ranks.
Rank from lowest to highest.
Reduces the effect of outliers.

Wilcoxon Rank Sum Test

Frank Wilcoxon: Developed ideas around nonparametric tests.
Dependent variable: ordinal but not necessarily interval.
Independent variable: between subjects.
Non parametric version of t-test.

How it works

Combine the two samples.
Rank all data points from smallest to largest.
Obtain the total sum of the ranks for each group.
Use a table or calculate z-scores based on the rank sums.
Slides and Jeckard and Bell textbook chapters are available on Moodle.

Example: Sexism Related to Gender.

Gender prejudice questionnaire scores from 0-50 (higher scores indicate more prejudice).
Measured in males and females.
Do we have between or within subjects? Between.
Is the data nominal, ordinal, or interval? Ordinal.

Hypotheses

Null hypothesis: There is no relation between gender and sexism scores.
Alternative hypothesis: There is a relation between gender and sexism.

Underlying Equation

z = (Rj - Ej) / \sigma_{rank sums}
R_j = rank sum of group j.
E_j = expected rank sum of group j.
\sigma_{rank sums} = standard deviation of rank sums.

Rank Sum

Sum the ranks for each member within the group.

Expected Rank Sum Equation

Ej = nj * (nj + nk + 1) / 2
nj = number of participants in group j.
nk = number of participants in group k.

Sigma of Rank Sums Equation

\sigma = \sqrt{ (n1 * n2 * (n1 + n2 + 1)) / 12 }
n1= number of elements in the first group.
n2 = number of elements in the second group.

Example Application

20 participants (9 male, 11 female).
Prejudice scores are provided for each participant.

Ranking

Rank across both groups.
If male group has higher ranks and female group has lower ranks, it indicates something.
Add them up to discover the rank sum.

Calculations

Observed rank sum minus expected rank sum for each group (differ from what is expected).
Divide the result by standard deviation.

Observed Rank Sum

Calculated as 116 for the male group.
Calculated as 94 for the female group.

Observations Before Math

A group with 11 would expect a bigger rank sum than 9.
Rank sums must consider how many ranks and members are in each group.

Expected Rank Calculation

Number of elements multiplied by total elements: Gives the expected rank sum.
Calculated an expected rank sum of 94.5 for males and 115.5 for females.

Z Score Calculation:

z = (116 - 94.5) / 13.23 = 1.63
The Z-score is smaller than 1.96, resulting in the test result probably not being significant.

Calculating for the other groups

Only need to calc on one because its the same Z score
Z score will be the same no matter what

Conclusion With this Information.

Z score of 1.96, and is therefore, probably not significant.
The example in the textbook is changed here, The book uses 10 and 10, while this example uses 9 and 11.

Understanding Z score

An extreme ranking will increase the Z score.

Why Do We Rank & Carl Friedrich Gauss

There an equation on Carl Friedrich Gauss where n+1 * n/2.
N = number of elements
There an expression of k =1 = n. Is essentially one + 100.
This means the sum of all ranks is from the first group + the sum of all
ranks from the second group.
Sum of a single rank is 1 / (n + 1)/2.

Rank Sums

The smallest rank sum you can have with four elements of eight of four is 1,2,3, and 4. with a score of 10.
Random Code = The smallest rank sum you can get. The more you run it, the
histogram ends up looking like this. because most of the ranks will
fall in the middle. can also analytically solve the math problem and
add them all up. This just takes tedious. but simpler.
Another way is using code to make the distributions.
The Code Consist of:
- 5 lines in the code.
- How times.
- Create memory number line.
- Set up a small chain of command.
- And then will have a long list of random orders

Code in depth.

How many to make this 200,00 run times = a define numbers for across the board. Number = 5.
Make a chain for 200,000. create it one set after other.
For Evrey run make it one to 200,00 and loop it back to 1. make chain 1, 2, 3 what ever. create random number and out put int long train.
Plot histogram.
It makes about 2 seconds.
Copy code.
For total size = the numbers. distribution has the mean of.
With 13 - you can look and see that there is a standard distribution.

Introduction

Assignment Feedback: Due to high workloads, tutors may require approximately three weeks to grade assignments. Students should plan accordingly and be patient during this period.
Instructor Introduction: Mark Shearer is presenting on nonparametric tests, a topic he has developed and specializes in. His expertise provides valuable insights into the applications and nuances of these statistical methods.
Previous Lectures: Weeks 1 through 4 covered foundational material. Building on concepts from Psyc 01/2023, the lectures primarily focused on parametric tests, setting the stage for understanding when and why nonparametric tests are necessary.
This Week's Focus: An in-depth introduction to nonparametric tests, highlighting their uses, advantages, and limitations compared to parametric tests.

Dates and Quiz Information

Important Note: The dates mentioned in the lecture slides are inaccurate. Students should rely on announcements for correct scheduling. Week 6 will cover correlation and regression, and Week 7 is dedicated to research with no scheduled lectures or tutorials.
Week 6: Correlation and Regression (Note: This is a correction to the information presented in the slides).
Week 7: Research Week
- No lectures or tutorials are scheduled.
- The midterm quiz will be accessible during the regular lecture hours and will remain open for two hours (16:30 - 18:30).
- The quiz is designed to be completed within one hour, allowing students to start as late as 17:30.
Week 8: Midterm Recess (no lecture on April 23, contrary to what might be indicated in the slides).
Weeks 9-13: Professor Andrew Zemit Mignon from the Math Department will instruct on ANOVA and related topics, providing a detailed exploration of variance analysis.

Why Nonparametric Tests?

Nonparametric tests are essential when the assumptions required for parametric tests are not met by the data.
Fundamental Requirement for Parametric Tests: Data must be measured on an interval scale (metric scale).
- Interval scale implies equal distances between points (e.g., the difference between 1 and 2 is the same as between 2 and 3), which is crucial for parametric test validity.
Non-metric Data: Data from questionnaires or IQ tests often do not meet interval scale requirements.
- The difference between scores may not be consistent across the scale.
- Example: The cognitive difference between test scores of 98 and 99 may not be equivalent to the difference between 89 and 90.
Ordinal Data: Data that can be sorted but lacks equal intervals is considered ordinal.
Using nonparametric tests offers a more appropriate and reliable approach when dealing with non-metric or ordinal data.

Normal Distribution

The assumption of normal distribution is critical for the validity of parametric tests.
While it is a significant theoretical assumption, many real-world phenomena approximate a normal distribution.
Histogram: A histogram is an effective tool for visually assessing the distribution of data to check for normality.

Example: Reading Level Usage

A study monitored the time primary school children spent on reading level one books (in minutes) before advancing to the next level.
The histogram of the data approximated a normal distribution.
n = 761 students.
Mean = 1768.21 minutes (29.5 hours).
Standard \ deviation = 753.55.
Reading time in minutes can be considered an interval scale measure.

Limitations of Normal Distribution

Questionnaire Scores: Consider scores that range from 10 to 20. Is the perceived or psychological difference between scores of 15 and 14 equal to that between 14 and 13? This is often not the case, indicating that the data may not be nicely interval scaled.
Non-normal Distribution: If a data histogram does not resemble a normal distribution, using a t-test or similar parametric test may be inappropriate.
Normal distribution is fully characterized by its mean and standard deviation.
Distributions that deviate significantly from normal are poorly described by just the mean and standard deviation.
- Examples include bimodal distributions, skewed distributions, and those with ceiling or floor effects.

Problems with Violating Normality Assumption?

Z-scores and t-tests rely on accurately estimating probabilities in the tails of the distribution.

Welcome to Nonparametric Tests

Nonparametric tests do not require the assumption of normally distributed data.
They are effective with small sample sizes (e.g., 10 or 20), where assessing normality is difficult.
Nonparametric tests yield p-values for hypothesis testing.
These tests are potentially slightly less sensitive than parametric tests, especially with small sample sizes; however, the difference in sensitivity is generally small.
Nonparametric tests use ranks as a fundamental tool for analysis.

Ranks

Ranking Procedure:
- Sort the values in ascending order.
- Assign the lowest rank (rank = 1) to the smallest value.
- In the case of tied scores, average the ranks that would have been assigned.

Example

Scores: 100, 103, 109, 91, 100
Ranked: 2.5, 4, 5, 1, 2.5
91 is the smallest and receives rank 1.
The two scores of 100 are the 2nd and 3rd smallest; thus, their ranks are averaged: (2+3)/2 = 2.5.

Why Rank?

Ranking evens out the distances between values, which is particularly useful when the interval scale assumption is questionable, such as in questionnaire data.
Interval scale assumes equal distance between values, which may not hold true in subjective measurements.
Datasets can be uneven, where the perceived difference between 90 and 91 might not equal the perceived difference between 100 and 101.
Ranks provide an even distribution by making all distances equal.
Adjusts for the effects of outliers and addresses uncertainties in interval properties.
Data should be at least ordinal for ranking to be meaningful.
Population distribution: Sample data is often the primary means to infer characteristics about the population.

Quick Repeat of Technicalities

Average tied ranks to maintain fairness and accuracy.
Rank from the lowest to the highest value.
Ranking reduces the impact of outliers on the analysis.

Wilcoxon Rank Sum Test

Frank Wilcoxon was a key figure in developing nonparametric tests.
Dependent variable: ordinal but not necessarily interval.
Independent variable: between subjects.
The nonparametric version of the t-test is used for independent samples.

How it works

Combine the two independent samples into one dataset.
Rank all data points from smallest to largest across the combined sample.
Calculate the total sum of the ranks for each group separately.
Use statistical tables or calculate z-scores based on the rank sums to determine statistical significance.
Detailed slides and relevant chapters from the Jeckard and Bell textbook are available on Moodle.

Example: Sexism Related to Gender.

Gender prejudice questionnaire scores range from 0-50, where higher scores indicate more prejudice.
Data is collected from both males and females.
Type of design: Between-subjects design.
Data Type: Ordinal.

Hypotheses

Null hypothesis: There is no statistically significant relation between gender and sexism scores.
Alternative hypothesis: There is a statistically significant relation between gender and sexism scores.

Underlying Equation

z = (Rj - Ej) / \sigma[rank sums]
R*j = rank sum of group j.
E*j = expected rank sum of group j.
\sigma[rank sums] = standard deviation of rank sums.

Rank Sum

Calculate the sum of the ranks for each member within each group.

Expected Rank Sum Equation

Ej = nj * (nj + nk + 1) / 2
n*j = number of participants in group j.
n*k = number of participants in group k.

Sigma of Rank Sums Equation

\sigma = \sqrt{ (n*1 * n*2 * (n1 + n2 + 1)) / 12 }
n1= number of elements in the first group.
n2 = number of elements in the second group.

Example Application

20 participants (9 male, 11 female).
Prejudice scores are provided for each participant.

Ranking

Rank all scores across both groups to eliminate bias.
If the male group predominantly has higher ranks and the female group has lower ranks, it suggests a relation between gender and prejudice scores.
Sum the ranks to obtain the rank sum for each group.

Calculations

Calculate the difference between the observed rank sum and the expected rank sum for each group.
Divide the result by the standard deviation to standardize the difference.

Observed Rank Sum

The observed rank sum for the male group is calculated as 116.
The observed rank sum for the female group is calculated as 94.

Observations Before Math

A group with 11 members is expected to have a larger rank sum than a group with 9 members if there is no actual difference between the groups.
Rank sums must take into account the number of ranks and the number of members in each group.

Expected Rank Calculation

The number of elements is multiplied by the total number of elements to derive the expected rank sum.
The expected rank sum is calculated as 94.5 for males and 115.5 for females.

Z Score Calculation:

z = (116 - 94.5) / 13.23 = 1.63
Because the calculated Z-score of 1.63 is smaller than the critical value of 1.96, the test result is probably not statistically significant at the 0.05 level.

Calculating for the other groups

The Z-score calculation only needs to be performed on one group since the Z-score will be the same in magnitude but opposite in sign for the other group.

Conclusion With this Information.

With a Z score of 1.63, which is less than the critical value of 1.96, the result is likely not significant, suggesting that gender does not significantly affect sexism scores in this sample.
Note: The example in the textbook has been modified. The textbook example uses equal groups of 10, whereas this example uses groups of 9 and 11.

Understanding Z score

An extreme ranking distribution will result in a larger Z score, indicating a greater departure from the null hypothesis.

Why Do We Rank & Carl Friedrich Gauss

Carl Friedrich Gauss contributed to understanding the sum of integers with the equation n(n+1) / 2.
Where n = number of elements.
The expression \sum{k=1}^{n}, which sums from 1 to n, is essentially 1 + 2 + 3 + … + n.
This means the sum of all ranks from the first group plus the sum of all ranks from the second group equals the total sum of all ranks.
The average rank in a distribution is (n + 1)/2.

Rank Sums

The smallest rank sum achievable with four elements out of eight is the sum of the first four ranks: 1 + 2 + 3 + 4 = 10.
Random Code: The smallest rank sum one can obtain. As the random code runs repeatedly, the histogram tends to show a distribution where most rank sums fall in the middle. This can also be solved analytically, though it may be more tedious.
An alternative approach involves using code to generate these distributions.
The code consists of:
- Approximately 5 lines.
- How many iterations to run.
- Creation of a memory number line.
- Setting up a small chain of commands.
- Outputting a long list of random orders.

Code in depth.

How many to make this 200,00 runs = a define numbers for across the board. Number = 5.
Make a chain for 200,000. create it one set after other.
- For Every run make it one to 200,00 and loop it back to 1. make chain 1, 2, 3 what ever. create random number and out put int long train.
Plot histogram.
- It runs in approximately 2 seconds.
Copy code
- For total