Study Notes on Statistical Significance and P-Values
Overview of Statistical Significance and P-Values
Last episode: Discussed p-values as a determinant of statistical significance.
Focus of this module: Exploring the concept of significance beyond just p-values.
Importance of Significance
Definition: "What’s the probability that a significant p-value indicates a true effect?"
Key question: Positive Predictive Value of a significant p-value.
Rephrase: Given a significant p-value, what is the probability that it stems from a real effect?
Statistical Study Example
Hypothetical situation: Conducted 1000 studies of various tests.
200 studies had a real effect.
800 studies had no effect.
Type I Error (α) Consideration
Type I error (α) set to 0.05:
Of the 800 tests with no effect:
95% = true negatives (no effect).
5% = false positives (showing significant results although there is no effect).
Power of the Study
Assuming a power of 80% (common in health studies):
Out of the 200 real effects:
160 studies = true positives (significant and show an effect).
40 studies = false negatives (real effects that are not significant).
Analysis of Significant Results
Calculation of significant results:
Total significant results = 200 (true positives + false negatives).
20% of the significant results (40/200) were false negatives (non-significant).
The false detection rate was 20%, not the expected 5%.
Positive Predictive Value
Positive Predictive Value (PPV): The chance that a p-value of 0.05 indicates a true effect is 80% when power is set to 80%.
Notably, average power in psychology studies is around 35%
In neuroscience, estimated to be only about 21%.
Impact of Lower Power Assumptions
With a 25% power and p=0.05:
Outcome results:
100 true positives
80 false positives
This would yield a false discovery rate of 44.4%, highlighting the inadequacy of simply relying on p=0.05.
Issues of P-Hacking
P-Hacking: Refers to questionable research practices to influence findings and push them over the p=0.05 threshold.
Can occur either intentionally or unintentionally.
If 15% of studies are p-hacked:
Even with 80% power, the false discovery rate could rise to 48.1%.
Nearly half of studies showing p=0.05 could be false positives.
Validity of Research Findings
John Ioannidis’ seminal paper (2005) titled "Why most published research findings are false":
Reviews the dynamics that lead to flawed research interpretations.
Statistical Findings on Trials
Considering well-performed, adequately powered Randomized Controlled Trials (RCTs) with a pre-study ratio of 1:1:
False discovery rate: 15% with p=0.05.
Confirmatory meta-analyses of good quality RCTs: 14.6% false discovery rate.
Meta-analyses of small inconclusive studies: 59.4% false discovery rate, suggesting significant results are often invalid.
Findings from Underpowered Studies
Underpowered RCTs with proper execution may still have a false discovery rate as high as 76.5%.
This means a significant p=0.05 result is three times more likely to be wrong.
Poorly performed underpowered studies: 82.5% false discovery rate.
More than four times more likely to be incorrect than correct.
Exploratory Research and High Dimensional Data
Ioannidis criticizes exploratory research using massive databases as akin to a "fishing expedition".
Example: Testing 30,000 genes with an expectation of finding 30 significant results can lead to a 99.9% false discovery rate, despite having a p-value of 0.05.
Implications for Reproducibility
Emphasizing the need for retesting results that meet the p=0.05 criterion.
There’s a misunderstanding that p=0.05 indicates an unequivocal true effect. This is false.
The p-value represents the probability of obtaining results at least as extreme as observed under the assumption that the null hypothesis is true.
Conclusion on Research Findings
To improve confidence in results, it is vital to:
Recognize the potential flaws in perceived significant findings.
Understand that repeated testing and validation are essential.
Simply accepting a p=0.05 result without further verification is a clear path to error.