We investigated the reproducibility of the major statistical conclusions drawn in 46 articles published in 2012 in three APA journals. After having identified 232 key statistical claims, we tried to reproduce, for each claim, the test statistic, its degrees of freedom, and the corresponding p value, starting from the raw data that were provided by the authors and closely following the Method section in the article. Out of the 232 claims, we were able to successfully reproduce 163 (70%), 18 of which only by deviating from the article's analytical description. Thirteen (7%) of the 185 claims deemed significant by the authors are no longer so. The reproduction successes were often the result of cumbersome and time-consuming trial-and-error work, suggesting that APA style reporting in conjunction with raw data makes numerical verification at least hard, if not impossible. This article discusses the types of mistakes we could identify and the tediousness of our reproduction efforts in the light of a newly developed taxonomy for reproducibility. We then link our findings with other findings of empirical research on this topic, give practical recommendations on how to achieve reproducibility, and discuss the challenges of large-scale reproducibility checks as well as promising ideas that could considerably increase the reproducibility of psychological research.
Sharing research data allows the scientific community to verify and build upon published work. However, data sharing is not common practice yet. The reasons for not sharing data are myriad: Some are practical, others are more fear-related. One particular fear is that a reanalysis may expose errors. For this explanation, it would be interesting to know whether authors that do not share data genuinely made more errors than authors who do share data. (Wicherts, Bakker and Molenaar 2011) examined errors that can be discovered based on the published manuscript only, because it is impossible to reanalyze unavailable data. They found a higher prevalence of such errors in papers for which the data were not shared. However, (Nuijten et al. 2017) did not find support for this finding in three large studies. To shed more light on this relation, we conducted a replication of the study by (Wicherts et al. 2011). Our study consisted of two parts. In the first part, we reproduced the analyses from (Wicherts et al. 2011) to verify the results, and we carried out several alternative analytical approaches to evaluate the robustness of the results against other analytical decisions. In the second part, we used a unique and larger data set that originated from (Vanpaemel et al. 2015) on data sharing upon request for reanalysis, to replicate the findings in (Wicherts et al. 2011). We applied statcheck for the detection of consistency errors in all included papers and manually corrected false positives. Finally, we again assessed the robustness of the replication results against other analytical decisions. Everything taken together, we found no robust empirical evidence for the claim that not sharing research data for reanalysis is associated with consistency errors.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.