Lecture 16 - Replication Studies

Replication:

whether you can repeat the entire methods in another lab, given the same research methods
- you are starting from scratch, collecting new data - only using the methods form the other researcher
- NEW researcher & NEW data

Reproduction:

ability to take someones published analyses, data, computer code and reproduce those findings
- you inherit the data and the computer code - not starting from scratch
- NEW researcher & SAME data

Repeatability:

The Replication Crisis

Replication Failures

Baker (2016): survey of 1,500 scientists

Stanford Prison Experiment: Questions about ethics and generalizability.

Bystander Effect: Mixed results reveal complexities in situational variables.

individuals are less likely to help in emergency situations when others are present
- internal validity issue because it depends on the relationship between the manipulations in the study

Methods to enhance replicability:

Why the replication crisis continues:

non-replicated findings have more citations (than replicated ones)
- if non-replicated findings have more citations than replicated findings, then we have a problem

Causes of Replication Crisis

Ignoring or misunderstand statistics
- misunderstand:
  - null hypotheses
  - meaning of p-values
Publication Bias
- the way we conduct, publish, distribute, and fund our science
Falsifying Data
Quality of Replication

Poor Hypothesis Practices (1A)

HARKing:
- Hypothesizing After the Results are Known
- Formulating or changing hypotheses after analyzing the data
  - generate a hypothesis based on what they find
  - post-hoc = after the collection & analysis of the data

SHARKing:
- Secretly harking
- in the introduction section
  - presenting hypotheses that emerged from post-hoc analyses and treating them as if they were a priori
- Introduction section of their published paper, present some hypothesis that emerged after they did the analyses & treating them as if they knew about it at the very beginning (before they did the experiment)
THARKing:
- Hollenbeck proposes we should thark
- Transparently (openly) HARKing in the discussion section
  - promote effectiveness and efficiency of science
  - ethically required on some cases
- in the discussion section, NOT the introduction
  - leave your original hypothesis the way it is
  - run your statistics
  - in the discussion section acknowledge clearly and transparently that there may be a new hypothesis derived form these results

2 cases studies Hollenbeck:

A researcher desperate to get a job takes 30 of the shortest and most easily obtained survey measures and creates a pair of long questionnaires… they run a new survey and find some significant correlations and publishes them as a priori hypotheses. No one can replicate the findings
- Harking
Epidemiologists test 100 patients on new drug to protect against virus. Correlation between treatment (drug) and survival rate = r of .1 (small). Some researchers notice that females react differently to the drug than males. The researchers re-evaluate the findings by peak estrogen age of participants and publish a short report as a post-hoc analysis. Others replicate the findings
- this is an example of THARKing.
  - Because theres no attempt to change the research hypothesis until the discussion question
    - after you acknowledge that the first hypothesis failed, you create a new one and an additional analysis

Meaning of p-values (1B)

null hypothesis significance testing:
- assumption that null hypothesis means there is no significant difference
- With a Large-enough N, virtually every study would yield a significant result
P-Hacking:
- unethical practice of manipulating or "hacking" statistical analyses in order to achieve statistically significant results, until a statistically significant result is obtained
  - lead to false-positive findings and misrepresentation of the true state
    - ex.
    - Stop collecting data when p < .05 2.
    - Analyze many measures, but report only those with p < .05 3.
    - Collect and analyze many conditions, but only report those with p < .05 4.
    - Add covariates to reach p < .05 5.
    - Exclude participants to reach p < .05 6.
    - Transform the data to reach p < .05
Cherry-Picking Data:
- Researchers may selectively report only the data or results that support their hypotheses
Data Fabrication & Falsification:
- Intentionally creating or altering research data to support desired outcomes
- this is the worst case

Publication Biases (2)

The File Drawer Problem:
- studies with non-significant or null results are less likely to
  be published
Selective Reporting:
- Researchers and academic institutions tend to favour novel discovery
Incomplete Knowledge:
- non-significant results are not published, scientific community may have an incomplete or biased view of a particular research question
Replication Challenges:

Solutions for Replication Crisis

Pre-registration - PRIOR
- detailed plan for research methods that are filed online (open) ahead of data collection
- these are set in stone and unchangeable
- no review prior to data collection
- NO stage 1 review
- does not improve replication rates
Registered report
- a detailed plan for research methods filed online that undergoes peer review prior to data collection.
- Stage 1 and Stage 2 peer review
- IMPROVE REPLICATION RATES

Reproducibility

Reproduction / Reproducibility:
- you are inheriting the data set, and you are simply re computing the analyses
- ability of a different researcher to reproduce another researcher's published analyses, given the original data and computer code for the statistics used

Why can reproducibility fail?

process reproducibility failure:
- original analysis cannot be repeated, unavailability of data
outcome reproducibility failure:
- reanalysis obtains a different result than the one reported originally

Artner et al’s (2021):

they were given the data and the code, they tried to generate the same statistics and instead only 70% they were able to reproduce

70% of findings could be reproduced
18 of those were reproduced only after deviating from the analysis reported in the original papers —> process reproducibility failure
conclusion: authors are not providing enough information