False Positive Psychology Undisclosed Flexibility

Purpose of Scientific Inquiry:
- Discover truths about the world through hypothesis generation, data collection, and analysis.
- Errors, particularly false positives, are inevitable in research.

Definition:
- A false positive is the incorrect rejection of a null hypothesis.
Consequences of False Positives:
1. Persistence in Literature:
  - Once published, false positives remain in the literature, affecting subsequent research.
  - Null results have various causes, making failures to replicate inconclusive.
2. Resource Wastage:
  - Promote ineffective research programs and policy changes that are not grounded in valid findings.
3. Credibility Risk:
  - Fields known for publishing false positives may lose credibility among scholars and the public.

Authors endorse a maximum false-positive rate of 5% (i.e., p ≤ .05).
Current standards and practices considerably increase false-positive rates.
Researcher Degrees of Freedom:
- Researchers make several critical decisions during data collection and analysis:
  - Should more data be collected?
  - Should some observations be excluded?
  - Which conditions to combine or compare?
  - Which control variables to include?
  - Should measures be combined or transformed?
- Decisions often not made in advance lead to flexible exploratory practices that may result in reporting only favorable outcomes.
Resulting Likelihood:
- The likelihood of obtaining false positives at the 5% level increases above 5% due to these practices.

Ambiguity and Self-Interest:
- Researchers tend to justify decisions that yield statistically significant results due to:
1. Ambiguity in making decisions.
2. The innate desire to find significant results.
Literature Support:
- Numerous studies illustrate that individuals often interpret ambiguous information in a self-serving manner (e.g., Dawson et al., 2002).

An analysis of 30 Psychological Science articles revealed vast inconsistencies in decisions regarding outlier treatment.
Definitions of outliers varied significantly:
- Fast responses: excluded at rates like the fastest 2.5% or at varying standard deviations.
- Slow responses: defined differently across studies.

Objective: Investigate if children's songs can induce age contrast.
Methodology:
- 30 undergraduates listened to a control song (“Kalimba”) or a children's song (“Hot Potato”).
- Completed a survey asking how old they felt; a question confirmed by controlling for participant’s father’s age.
Results:
- ANCOVA indicated participants felt older after “Hot Potato” (adjusted M = 2.54 years) compared to control (adjusted M = 2.06 years), F(1, 27) = 5.06, p = .033.

Objective: Determine if listening to a song about older age could make participants feel younger.
Methodology:
- Sample of 20 undergraduates, listening to either “When I’m Sixty-Four” or “Kalimba.” Participants indicated their birth dates and father’s age.
Results:
- ANCOVA indicated participants reported being younger after listening to “When I’m Sixty-Four” (adjusted M = 20.1 years) compared to “Kalimba” (adjusted M = 21.5 years), F(1, 17) = 4.92, p = .040.

Computer simulations were employed to study the influence of researcher degrees of freedom on false-positive rates.
Four common researcher degrees of freedom identified:
1. Flexibility in the choice of dependent variables.
2. Flexibility in choosing sample size.
3. Use of covariates.
4. Reporting subsets of experimental conditions.
Results from 15,000 simulations show varying degrees of freedom led to increased false-positive rates.

Situation	p < .1	p < .05	p < .01
A: Two dependent variables (r = .50)	17.8%	9.5%	2.2%
B: Addition of 10 more observations per cell	14.5%	7.7%	1.6%
C: Controlling for gender or interaction	21.6%	11.7%	2.7%
D: Dropping (or not dropping) one of three conditions	23.2%	12.6%	2.8%
Combine A and B	26.0%	14.4%	3.3%
Combine A, B, and C	50.9%	30.9%	8.4%
Combine A, B, C, and D	81.5%	60.7%	21.5%

Conclusions from Simulations:
- High false-positive rates associated with common degrees of freedom emphasize the need for stricter reporting practices.

Predecide Data Collection Rules:
- Must select termination rules before data collection and report them.
Minimum Observations:
- At least 20 observations per condition required unless justified.
Listing Variables:
- Authors must list all collected variables.
Report Experimental Conditions:
- Report all experimental conditions, including unsuccessful manipulations.
Disclose Data Exclusions:
- If any observations are eliminated, report outcomes if those observations remained.
Covariate Reporting:
- Statistical results should be shared with and without covariates.

Ensure Compliance with Authors’ Requirements:
- Reviewers act as gatekeepers of scientific integrity.
Tolerance for Imperfections:
- Accept that minor imperfections in results are common.
Demonstrate Analytic Robustness:
- Authors should demonstrate results are unaffected by arbitrary decisions.
Require Exact Replication When Justification is Lacking:
- If justifications for degrees of freedom are unconvincing, demand replication studies.

Research findings were revisited and transparently reported adhering to proposed guidelines.
The findings initially labeled as significant were revealed to depend heavily on researcher degrees of freedom and selective reporting.

Emphasis on transparency in research to mitigate publication bias and bolster scientific integrity.
Suggested solutions aim to impose minimal burdens while promoting honest reporting and helping maintain the credibility of the field.
Continued pressures in academia may persist, but adherence to these standards is vital for publishing truthful and meaningful research outcomes.
The quest is not merely to publish but to enhance the scientific accuracy of psychological inquiry.

Babcock, L. & Loewenstein, G. (1997); Dawson, E. et al. (2002); Gilovich, T. (1983); Hastorf, A.H. & Cantril, H. (1954); Ioannidis, J.P. A. (2005); John, L. et al. (2011); Kunda, Z. (1990); Pocock, S.J. (1977); Schooler, J. (2011); Wagenmakers, E.J. et al. (2011); Zuckerman, M. (1979).