Lecture 16 - Replication Studies
Replication:
whether you can repeat the entire methods in another lab, given the same research methods
you are starting from scratch, collecting new data - only using the methods form the other researcher
NEW researcher & NEW data
Reproduction:
ability to take someones published analyses, data, computer code and reproduce those findings
you inherit the data and the computer code - not starting from scratch
NEW researcher & SAME data
Repeatability:
SAME researcher & SAME data
The Replication Crisis
Replication / Replicability:
a published study's findings can be repeated in a different lab, using the same research methods
a different lab is running your experiment from scratch = no inheritance of any data
The Crisis: a failure for published research findings to be repeated in other labs, when they follow the same methods
Gold Standard: replicability is often considered the best possible evidence for the accuracy of a finding and that the results werenât just a fluke
more time and effort than reproducibility
Replication Failures
Baker (2016): survey of 1,500 scientists
claims that there is a replication crisis
70% reported that a failure to replicate the results of othersâ studies
50% had failed to replicate their own experiment
Stanford Prison Experiment: Questions about ethics and generalizability.
external validity issues
Bystander Effect: Mixed results reveal complexities in situational variables.
individuals are less likely to help in emergency situations when others are present
internal validity issue because it depends on the relationship between the manipulations in the study
Methods to enhance replicability:
better documentation of methods used
run the study again
ask a lab member to replicate the study
Why the replication crisis continues:
non-replicated findings have more citations (than replicated ones)
if non-replicated findings have more citations than replicated findings, then we have a problem
Causes of Replication Crisis
Ignoring or misunderstand statistics
misunderstand:
null hypotheses
meaning of p-values
Publication Bias
the way we conduct, publish, distribute, and fund our science
Falsifying Data
Quality of Replication
Poor Hypothesis Practices (1A)
HARKing:
Hypothesizing After the Results are Known
Formulating or changing hypotheses after analyzing the data
generate a hypothesis based on what they find
post-hoc = after the collection & analysis of the data
SHARKing:
Secretly harking
in the introduction section
presenting hypotheses that emerged from post-hoc analyses and treating them as if they were a priori
Introduction section of their published paper, present some hypothesis that emerged after they did the analyses & treating them as if they knew about it at the very beginning (before they did the experiment)
THARKing:
Hollenbeck proposes we should thark
Transparently (openly) HARKing in the discussion section
promote effectiveness and efficiency of science
ethically required on some cases
in the discussion section, NOT the introduction
leave your original hypothesis the way it is
run your statistics
in the discussion section acknowledge clearly and transparently that there may be a new hypothesis derived form these results
2 cases studies Hollenbeck:
A researcher desperate to get a job takes 30 of the shortest and most easily obtained survey measures and creates a pair of long questionnaires⊠they run a new survey and find some significant correlations and publishes them as a priori hypotheses. No one can replicate the findings
Harking
Epidemiologists test 100 patients on new drug to protect against virus. Correlation between treatment (drug) and survival rate = r of .1 (small). Some researchers notice that females react differently to the drug than males. The researchers re-evaluate the findings by peak estrogen age of participants and publish a short report as a post-hoc analysis. Others replicate the findings
this is an example of THARKing.
Because theres no attempt to change the research hypothesis until the discussion question
after you acknowledge that the first hypothesis failed, you create a new one and an additional analysis
Meaning of p-values (1B)
null hypothesis significance testing:
assumption that null hypothesis means there is no significant difference
With a Large-enough N, virtually every study would yield a significant result
P-Hacking:
unethical practice of manipulating or "hacking" statistical analyses in order to achieve statistically significant results, until a statistically significant result is obtained
lead to false-positive findings and misrepresentation of the true state
ex.
Stop collecting data when p < .05 2.
Analyze many measures, but report only those with p < .05 3.
Collect and analyze many conditions, but only report those with p < .05 4.
Add covariates to reach p < .05 5.
Exclude participants to reach p < .05 6.
Transform the data to reach p < .05
Cherry-Picking Data:
Researchers may selectively report only the data or results that support their hypotheses
Data Fabrication & Falsification:
Intentionally creating or altering research data to support desired outcomes
this is the worst case
Publication Biases (2)
The File Drawer Problem:
studies with non-significant or null results are less likely to
be published
Selective Reporting:
Researchers and academic institutions tend to favour novel discovery
Incomplete Knowledge:
non-significant results are not published, scientific community may have an incomplete or biased view of a particular research question
Replication Challenges:
Solutions for Replication Crisis
Pre-registration - PRIOR
detailed plan for research methods that are filed online (open) ahead of data collection
these are set in stone and unchangeable
no review prior to data collection
NO stage 1 review
does not improve replication rates
Registered report
a detailed plan for research methods filed online that undergoes peer review prior to data collection.
Stage 1 and Stage 2 peer review
IMPROVE REPLICATION RATES
Reproducibility
Reproduction / Reproducibility:
you are inheriting the data set, and you are simply re computing the analyses
ability of a different researcher to reproduce another researcher's published analyses, given the original data and computer code for the statistics used
Why can reproducibility fail?
process reproducibility failure:
original analysis cannot be repeated, unavailability of data
outcome reproducibility failure:
reanalysis obtains a different result than the one reported originally
Artner et alâs (2021):
they were given the data and the code, they tried to generate the same statistics and instead only 70% they were able to reproduce
70% of findings could be reproduced
18 of those were reproduced only after deviating from the analysis reported in the original papers â> process reproducibility failure
conclusion: authors are not providing enough information