Study Notes on Statistical Significance and P-Values

Overview of Statistical Significance and P-Values

Last episode: Discussed p-values as a determinant of statistical significance.
Focus of this module: Exploring the concept of significance beyond just p-values.

Importance of Significance

Definition: "What’s the probability that a significant p-value indicates a true effect?"
Key question: Positive Predictive Value of a significant p-value.
- Rephrase: Given a significant p-value, what is the probability that it stems from a real effect?

Statistical Study Example

Hypothetical situation: Conducted 1000 studies of various tests.
- 200 studies had a real effect.
- 800 studies had no effect.

Type I Error (α) Consideration

Type I error (α) set to 0.05:
- Of the 800 tests with no effect:
- 95% = true negatives (no effect).
- 5% = false positives (showing significant results although there is no effect).

Power of the Study

Assuming a power of 80% (common in health studies):
- Out of the 200 real effects:
- 160 studies = true positives (significant and show an effect).
- 40 studies = false negatives (real effects that are not significant).

Analysis of Significant Results

Calculation of significant results:
- Total significant results = 200 (true positives + false negatives).
- 20% of the significant results (40/200) were false negatives (non-significant).
The false detection rate was 20%, not the expected 5%.

Positive Predictive Value

Positive Predictive Value (PPV): The chance that a p-value of 0.05 indicates a true effect is 80% when power is set to 80%.
- Notably, average power in psychology studies is around 35%
- In neuroscience, estimated to be only about 21%.

Impact of Lower Power Assumptions

With a 25% power and p=0.05:
- Outcome results:
- 100 true positives
- 80 false positives
- This would yield a false discovery rate of 44.4%, highlighting the inadequacy of simply relying on p=0.05.

Issues of P-Hacking

P-Hacking: Refers to questionable research practices to influence findings and push them over the p=0.05 threshold.
- Can occur either intentionally or unintentionally.
If 15% of studies are p-hacked:
- Even with 80% power, the false discovery rate could rise to 48.1%.
- Nearly half of studies showing p=0.05 could be false positives.

Validity of Research Findings

John Ioannidis’ seminal paper (2005) titled "Why most published research findings are false":
- Reviews the dynamics that lead to flawed research interpretations.

Statistical Findings on Trials

Considering well-performed, adequately powered Randomized Controlled Trials (RCTs) with a pre-study ratio of 1:1:
- False discovery rate: 15% with p=0.05.
- Confirmatory meta-analyses of good quality RCTs: 14.6% false discovery rate.
Meta-analyses of small inconclusive studies: 59.4% false discovery rate, suggesting significant results are often invalid.

Findings from Underpowered Studies

Underpowered RCTs with proper execution may still have a false discovery rate as high as 76.5%.
- This means a significant p=0.05 result is three times more likely to be wrong.
Poorly performed underpowered studies: 82.5% false discovery rate.
- More than four times more likely to be incorrect than correct.

Exploratory Research and High Dimensional Data

Ioannidis criticizes exploratory research using massive databases as akin to a "fishing expedition".
- Example: Testing 30,000 genes with an expectation of finding 30 significant results can lead to a 99.9% false discovery rate, despite having a p-value of 0.05.

Implications for Reproducibility

Emphasizing the need for retesting results that meet the p=0.05 criterion.
There’s a misunderstanding that p=0.05 indicates an unequivocal true effect. This is false.
The p-value represents the probability of obtaining results at least as extreme as observed under the assumption that the null hypothesis is true.

Conclusion on Research Findings

To improve confidence in results, it is vital to:
- Recognize the potential flaws in perceived significant findings.
- Understand that repeated testing and validation are essential.
Simply accepting a p=0.05 result without further verification is a clear path to error.