DataC8 - 11.4. Error Probabilities
Error Probabilities
Contents
11.4.1. Wrong Conclusions
11.4.2. The Chance of an Error
11.4.3. The Cutoff for the p-value is an Error Probability
11.4.4. Data Snooping and p-Hacking
11.4.5. Technical Note: The Other Kind of Error
11.4.1. Wrong Conclusions
In hypothesis testing, we compare two hypotheses: the null hypothesis ($H0$) and the alternative hypothesis ($Ha$).
There are four possible outcomes when testing these hypotheses:
Test Favors the Null Hypothesis:
Null is True: Correct result (Accepting $H_0$)
Alternative is True: Error (Type II Error, also known as a False Negative, incorrectly accepting $H_0$)
Test Favors the Alternative Hypothesis:
Null is True: Error (Type I Error, also known as a False Positive, incorrectly rejecting $H_0$)
Alternative is True: Correct result (Rejecting $H_0$)
11.4.2. The Chance of an Error
When testing whether a coin is fair, the hypotheses are defined as follows:
Null Hypothesis ($H_0$): The coin is fair (i.e., the outcomes resemble random draws from Heads and Tails).
Alternative Hypothesis ($H_a$): The coin is not fair.
Testing is based on 2000 coin tosses:
Expected heads if fair: $1000$ (i.e., $2000 / 2$)
Test statistic defined as:
Empirical distribution of the test statistic under the null hypothesis shows a distribution pattern with an area (probability) of just under 5% for values over 45 favoring the alternative hypothesis.
Therefore, using a 5% cutoff for the p-value leads to:
Conclusion: If the coin is fair, there is about a 5% chance that the test will incorrectly conclude the coin is unfair (Type I Error).
11.4.3. The Cutoff for the p-value is an Error Probability
The general principle states:
If using a p-value cutoff of $eta$%, there is about a $eta$% chance of incorrectly rejecting $H0$ if in fact $H0$ is true.
Error Probability Table:
This table outlines the four possible outcomes in hypothesis testing, where the probabilities in the top row are calculated under the condition that $H_0$ is true. The p-value represents the probability of making an error (in red):
Test Favors the Null: Correct result (Accept $H0$) / Type I Error (Reject $H0$)
Test Favors the Alternative: Type II Error (Fail to reject $H0$) / Correct result (Reject $H0$)
The table is a fundamental representation of conclusions based on statistical tests.
11.4.3.1. Controlling for the Error
Implementing a 1% cutoff is more stringent than a 5% cutoff, reducing the likelihood of rejecting $H_0$ if it is true.
Context in medical trials:
Null Hypothesis ($H_0$): The treatment has no effect; differences in outcomes are due to random variation.
Alternative Hypothesis ($H_a$): The treatment has an effect.
While a 1% cutoff reduces Type I Error probability, it does not entirely eliminate it:
Even at a 1% cutoff, there remains a 1% chance of falsely concluding that the treatment has an effect (due to chance variation).
Random sampling seeks to identify this chance variation.
11.4.4. Data Snooping and p-Hacking
Scenario involving multiple research groups:
If 100 different groups run randomized controlled trials (RCTs) on a treatment that has no actual effect (using a 1% cutoff), it is statistically expected that at least one will incorrectly find a significant effect due to chance variation.
Importance of replication:
Other researchers should validate findings by replicating experiments to confirm or refute initial conclusions regarding the treatment's effects.
Issues with testing multiple hypotheses:
In trials assessing various effects of a drug, it is possible that some tests may show a treatment effect by randomness alone, even if the treatment is ineffective.
Recommendations when studying research:
Consider how many different hypotheses were tested before the one that was published was reported as statistically significant.
Caution is advised if multiple tests were conducted before arriving at a significant result, indicating possible data snooping or p-hacking, where data is manipulated or misused to produce significant results.
Validating reported results through replication is essential to confirm that the treatment effect exists.
11.4.5. Technical Note: The Other Kind of Error
Be aware that there is another error type:
Type II Error: Concluding the treatment has no effect when it truly does have an effect.
Acknowledgment of the dilemma in hypothesis testing:
Efforts to minimize Type I errors tends to increase Type II errors and vice versa. This trade-off is critical in the design and interpretation of statistical testing.