Lecture 16 - Replication Studies

Replication:

  • whether you can repeat the entire methods in another lab, given the same research methods

    • you are starting from scratch, collecting new data - only using the methods form the other researcher

    • NEW researcher & NEW data

Reproduction:

  • ability to take someones published analyses, data, computer code and reproduce those findings

    • you inherit the data and the computer code - not starting from scratch

    • NEW researcher & SAME data

Repeatability:

  • SAME researcher & SAME data

The Replication Crisis

  • Replication / Replicability:

    • a published study's findings can be repeated in a different lab, using the same research methods

      • a different lab is running your experiment from scratch = no inheritance of any data

    • The Crisis: a failure for published research findings to be repeated in other labs, when they follow the same methods

    • Gold Standard: replicability is often considered the best possible evidence for the accuracy of a finding and that the results weren’t just a fluke

    • more time and effort than reproducibility

Replication Failures

Baker (2016): survey of 1,500 scientists

  • claims that there is a replication crisis

  • 70% reported that a failure to replicate the results of others’ studies

  • 50% had failed to replicate their own experiment

Stanford Prison Experiment: Questions about ethics and generalizability.

  • external validity issues

Bystander Effect: Mixed results reveal complexities in situational variables.

  • individuals are less likely to help in emergency situations when others are present

    • internal validity issue because it depends on the relationship between the manipulations in the study

Methods to enhance replicability:

  • better documentation of methods used

  • run the study again

  • ask a lab member to replicate the study

Why the replication crisis continues:

  • non-replicated findings have more citations (than replicated ones)

    • if non-replicated findings have more citations than replicated findings, then we have a problem

Causes of Replication Crisis

  1. Ignoring or misunderstand statistics

    • misunderstand:

      • null hypotheses

      • meaning of p-values

  2. Publication Bias

    • the way we conduct, publish, distribute, and fund our science

  3. Falsifying Data

  4. Quality of Replication

Poor Hypothesis Practices (1A)

  • HARKing:

    • Hypothesizing After the Results are Known

    • Formulating or changing hypotheses after analyzing the data

      • generate a hypothesis based on what they find

      • post-hoc = after the collection & analysis of the data

  • SHARKing:

    • Secretly harking

    • in the introduction section

      • presenting hypotheses that emerged from post-hoc analyses and treating them as if they were a priori

    • Introduction section of their published paper, present some hypothesis that emerged after they did the analyses & treating them as if they knew about it at the very beginning (before they did the experiment)

  • THARKing:

    • Hollenbeck proposes we should thark

    • Transparently (openly) HARKing in the discussion section

      • promote effectiveness and efficiency of science

      • ethically required on some cases

    • in the discussion section, NOT the introduction

      • leave your original hypothesis the way it is

      • run your statistics

      • in the discussion section acknowledge clearly and transparently that there may be a new hypothesis derived form these results

2 cases studies Hollenbeck:

  1. A researcher desperate to get a job takes 30 of the shortest and most easily obtained survey measures and creates a pair of long questionnaires
 they run a new survey and find some significant correlations and publishes them as a priori hypotheses. No one can replicate the findings

    • Harking

  2. Epidemiologists test 100 patients on new drug to protect against virus. Correlation between treatment (drug) and survival rate = r of .1 (small). Some researchers notice that females react differently to the drug than males. The researchers re-evaluate the findings by peak estrogen age of participants and publish a short report as a post-hoc analysis. Others replicate the findings

    • this is an example of THARKing.

      • Because theres no attempt to change the research hypothesis until the discussion question

        • after you acknowledge that the first hypothesis failed, you create a new one and an additional analysis

Meaning of p-values (1B)

  • null hypothesis significance testing:

    • assumption that null hypothesis means there is no significant difference

    • With a Large-enough N, virtually every study would yield a significant result

  • P-Hacking:

    • unethical practice of manipulating or "hacking" statistical analyses in order to achieve statistically significant results, until a statistically significant result is obtained

      • lead to false-positive findings and misrepresentation of the true state

        • ex.

        • Stop collecting data when p < .05 2.

        • Analyze many measures, but report only those with p < .05 3.

        • Collect and analyze many conditions, but only report those with p < .05 4.

        • Add covariates to reach p < .05 5.

        • Exclude participants to reach p < .05 6.

        • Transform the data to reach p < .05

  • Cherry-Picking Data:

    • Researchers may selectively report only the data or results that support their hypotheses

  • Data Fabrication & Falsification:

    • Intentionally creating or altering research data to support desired outcomes

    • this is the worst case

Publication Biases (2)

  • The File Drawer Problem:

    • studies with non-significant or null results are less likely to

      be published

  • Selective Reporting:

    • Researchers and academic institutions tend to favour novel discovery

  • Incomplete Knowledge:

    • non-significant results are not published, scientific community may have an incomplete or biased view of a particular research question

  • Replication Challenges:

Solutions for Replication Crisis

  • Pre-registration - PRIOR

    • detailed plan for research methods that are filed online (open) ahead of data collection

    • these are set in stone and unchangeable

    • no review prior to data collection

    • NO stage 1 review

    • does not improve replication rates

  • Registered report

    • a detailed plan for research methods filed online that undergoes peer review prior to data collection.

    • Stage 1 and Stage 2 peer review

    • IMPROVE REPLICATION RATES

Reproducibility

  • Reproduction / Reproducibility:

    • you are inheriting the data set, and you are simply re computing the analyses

    • ability of a different researcher to reproduce another researcher's published analyses, given the original data and computer code for the statistics used

Why can reproducibility fail?

  • process reproducibility failure:

    • original analysis cannot be repeated, unavailability of data

  • outcome reproducibility failure:

    • reanalysis obtains a different result than the one reported originally

Artner et al’s (2021):

  • they were given the data and the code, they tried to generate the same statistics and instead only 70% they were able to reproduce

  • 70% of findings could be reproduced

  • 18 of those were reproduced only after deviating from the analysis reported in the original papers —> process reproducibility failure

  • conclusion: authors are not providing enough information