Module 12 – 1 Tailed + 2 Tailed Tests + Rejecting The Null
One-Tailed vs. Two-Tailed Tests
Determining Cutoff Sample Score
Step 3 of null hypothesis testing involves determining the cutoff sample score on the comparison distribution at which the null hypothesis should be rejected.
Typically, alpha is set at 0.05, resulting in cutoff values of +1.64 or -1.64.
One-Tailed Tests
One-tailed tests are directional, with the 5% significance level placed on either the left or right tail of the distribution.
If the sample mean is expected to be higher:
The 5% is placed on the right tail.
Example:
Research hypothesis: People who hear about positive traits will give a higher mean attractiveness rating than people who don’t.
Null hypothesis: People who hear about positive traits will give the same mean attractiveness rating as people who don’t.
If the sample’s Z score is higher than 1.64, the null hypothesis is rejected, with a 5% chance of error.
If the sample mean is expected to be lower:
The 5% is placed on the left tail.
Example:
Research hypothesis: People who take a painkiller will give a lower mean pain rating than people who don’t.
Null hypothesis: People who take a painkiller will give the same mean pain rating as people who don’t.
If the sample’s Z score is lower than -1.64, the null hypothesis is rejected, with a 5% chance of error.
Two-Tailed Tests
Two-tailed tests are used to determine if there is any effect, regardless of direction.
Example: How will anti-anxiety medication affect the grades of students with really bad test anxiety?
Grades might improve or get worse.
In this case, the 5% significance level is split, with 2.5% on the right and 2.5% on the left, using cutoff values of ±1.96.
Example:
Research hypothesis: The GPA of students who take anti-anxiety medication will be different than the GPA of students who don’t ().
Null hypothesis: Students who take anti-anxiety medication will have the same mean GPA as students who don’t ().
Z-Scores and Decision Making
If the sample’s Z score is higher than +1.96 or lower than -1.96, the null hypothesis is rejected, with a 5% chance of error.
Step 4 involves finding the sample’s score on the comparison distribution.
Step 5 involves checking if the score is more extreme than -1.96 or +1.96.
If the score is between these points, the null hypothesis is not rejected.
If the score is more extreme, the null hypothesis is rejected.
Two-Tailed Example: Cat Ownership and Neuroticism
Research question: Does owning a cat affect neuroticism in undergraduates?
The average score for non-cat students on the Louisville Undergraduate Neuroticism Index (LUNI) is 5 with a standard deviation of 4.
A sample of 36 students is given a cat, and after 30 days, their LUNI score is measured.
The mean LUNI score for the group is 7.
Step 1: Hypotheses
Research hypothesis: The LUNI scores of students who get a cat will be different than the LUNI scores of students who don’t own a cat ().
Null hypothesis: Students who get a cat will have the same mean LUNI score as students who don’t ().
Step 2: Comparison Distribution
Given population characteristics: and .
Mean of the distribution of means: .
Standard deviation of the distribution of means: .
Step 3: Cutoff Score
Alpha is set at 0.05, using a non-directional test.
Cutoff score: .
Any score higher than +1.96 or lower than -1.96 will result in rejecting the null hypothesis.
Step 4: Sample's Score
Formula: .
If the sample’s mean on the LUNI is 7:
Step 5: Decision
Cutoff: .
Sample’s Z score: 2.99.
2.99 is more extreme than +1.96, so reject the null hypothesis.
Conclusion:
The sample likely did not come from the comparison distribution.
Cats appear to have an effect on neuroticism in college students.
Students receiving cats scored significantly higher on the LUNI than non-cat owners.
Similarities and Differences
Two-tailed tests are very similar to one-tailed tests.
Instead of one cutoff (+1.64 or -1.64), there are two cutoffs (+1.96 and -1.96).
All other hypothesis testing principles still apply.
Practice vs Reality
It’s rare to design a study that makes no directional predictions.
Psychologists usually believe something will increase or decrease performance.
However, psychologists almost always do two-tailed tests.
So basically,
The rule is:
If research hyp. is nondirectional, then do two-tailed test.
If research hyp. is directional, then do one-tailed test.
The reality is:
If research hyp. is nondirectional, then do two-tailed test.
If research hyp. is directional, then do two-tailed test.
Stringency of Two-Tailed Tests
Two-tailed tests are more stringent.
The sample score has to be more extreme/unusual in order to reject the null hypothesis.
If you do a one-tailed test your mean only needs to be 1.64 SDs away from the mean for you to reject the null.
If you do a two-tailed test then your mean needs to be at least 1.96 SDs away from the mean.
One- vs Two-Tailed Tests
One-tailed test: “My sample score has to be at least this far from the mean before I’ll decide that my sample probably didn’t come from a population like this one.”
Two-tailed test: “My sample score has to be at least this much farther from the mean before I’ll decide that my sample probably didn’t come from a population like this one.”
Psychologists' Beliefs on Decision Making
Psychologists believe the decision to reject the null hypothesis is more likely to be correct if they do a two- versus a one-tailed test.
1.96 is the point where only 2.5% of scores are higher so it’s almost like setting your alpha to .025.
In fact, if I asked you to do a one-tailed test and set your alpha to .025 you would look up the corresponding Z score on the table, which is 1.96, and use that for your cutoff.
FAQ
So why not just set your alpha to .025 and follow the rules?
If you set your alpha to .025 and do a one-tailed test then your fine if everything works out as planned, but what if you get an extreme score on the other side?
For example, you could be testing the efficacy of an anti- smoking campaign for teens and you set your cutoff at -1.96 thinking your campaign would lower the mean number of teens who started smoking. If your sample had a Z score of 4.2 that would be very interesting! Your campaign actually increased the amount of teens that started smoking.
Unfortunately, if you were doing a one-way test you would fail to reject the null because your sample’s Z score was not more extreme than -1.96.
A two-tailed test using -1.96 and +1.96 would catch this extreme score.
You might be surprised at how often this occurs.
An extreme result in either direction is usually interesting so it’s better to just use a two-tailed test so you can detect any extreme score.
Why not just do a one-tailed test and change the direction later if you need to?
For one that’s cheating. It’s like calling heads before flipping a coin, but reserving the right to change to tails after the flip.
If you put your 5% on one tail but reserve the right to move the 5% to the other tail then your still going to have a 10% chance of rejecting the null hypothesis when the null hypothesis is true. It’s the same as setting your alpha at .1 which is way too high.
Tails?
There is no right answer for this.
Two-tailed tests with an alpha of .05 are the standard so we’re going to mostly be working with two-tailed tests from this point forward.
If I ever expect a one-tailed test in this course then I’ll specifically ask you to use one.
If you’re using inferential tests outside of this course then you will need to decide your alpha level and if you want to allocate it to one tail or two.
We’ll talk more about alpha and errors in the next module. Hopefully, you’ll get a better idea of when you might want to use a different alpha level or why someone might want to only use a one-tailed test.
Before we end the module I want to take a moment to talk about rejecting the null hypothesis.
Rejecting the Null Hypothesis
When we reject the null hypothesis we are making a decision based on evidence from a sample.
Like any decision, it may be correct or incorrect
The fact that a sample score is unusual when compared to a known distribution of scores is not proof that the sample came from a different distribution
It is just evidence that supports the idea that it might be from a different distribution.
If the evidence leads us astray, then we make a decision error
Using one- or two- tailed tests, or changing your alpha level will affect your chances of making different types of decisions errors.
It might seem strange to you that psychologists can pick any cutoff they want.
Keep in mind that changing your cutoff to get a significant difference is not the same as detecting a difference between the sample and the comparison distribution.
It’s never good to publish an incorrect decision.
Other scientists won’t be able to replicate your finding.
Studies based on your study will be flawed.
Some research has the potential to cause harm if incorrect.
So just remember that rejecting the null hypothesis doesn’t prove anything, it just provides support for your research hypothesis.
Always be careful with the language you use when describing the results of an inferential test.
Replication
Because there is always a chance you can make an incorrect decision many scientists like to replicate their studies at least once before they publish them.
If you repeat the study and get a significant result again then it is still possible you made another error, but the probability drops from .05 to .0025.
This is why you will often see research articles with 2 or 3 studies. The researcher is changing the study slightly each time to learn more about what they are studying, but also replicating their finding to reduce the chance of a decision error.