ODDPub – a Text-Mining Algorithm to Detect Data Sharing in Biomedical Publications

Riedel, Nico; Kip, Miriam; Bobrov, Evgeny

doi:10.5334/dsj-2020-042

Cited by 25 publications

(46 citation statements)

References 10 publications

Supporting

Mentioning

Contrasting

Unclassified

Order By: Relevance

“…We proceeded to test the performance of algorithms automating the process of identifying indicators of transparency. Three were developed from scratch (COI disclosure, Funding disclosure and Protocol registration) and two were adopted from an already existing library [28] (Data and Code sharing) and customised to enhance efficiency (see Materials and Methods). Note that, even though in the manual assessment, Data sharing and Code sharing indicators capture any statement about data or code availability, in the automated assessment we only capture newly generated open raw data or code.…”

Section: Resultsmentioning

confidence: 99%

“…However, with the vast majority of articles not sharing data or code (5197/6017), this 1 article was dramatically overweighted. This is reflected by the large confidence interval (34-94%), which includes the estimated sensitivity of 73% from a random sample of 800 PMC research articles of 2018 calculated by the original authors of this algorithm [29].…”

Section: Resultsmentioning

confidence: 99%

Section: Automated Assessment Of Transparency: Development and Validamentioning

confidence: 99%

“…We adjusted a previously reported algorithm developed by N.R. [28] investigators (e.g. consider only COI disclosures that were specifically included as a stand-alone statement, rather than within acknowledgements) and (c) to ease adjudication of their performance.…”

Section: Automated Assessment Of Transparency: Developmentmentioning

confidence: 99%

“…A member of our team (N.R.) had already developed algorithms to automatically extract information about data and code sharing [28] . Briefly, these algorithms use regular expressions to identify whether an article mentions (a) a general database in which data is frequently deposited (e.g.…”

Section: Competing Interest Declarationmentioning

confidence: 99%

See 4 more Smart Citations

Assessment of transparency indicators across the biomedical literature: how open is open?

Serghiou

Contopoulos‐Ioannidis

Boyack

et al. 2020

Preprint

Self Cite

View full text Add to dashboard Cite

Recent concerns about the reproducibility of science have led to several calls for more open and transparent research practices and for the monitoring of potential improvements over time. However, with tens of thousands of new biomedical articles published per week, manually mapping and monitoring changes in transparency is unrealistic. We present an open-source, automated approach to identify five indicators of transparency (data sharing, code sharing, conflicts of interest disclosures, funding disclosures and protocol registration) and apply it across the entire open access biomedical literature of 2.75 million articles on PubMed Central. Our results indicate remarkable improvements in some (e.g. conflict of interest disclosures, funding disclosures), but not other (e.g. protocol registration, code sharing) areas of transparency over time, and map transparency across fields of science, countries, journals and publishers. This work has enabled the creation of a large, integrated, and openly available database to expedite further efforts to monitor, understand and promote transparency and reproducibility in science.

show abstract

Section: Resultsmentioning

confidence: 99%

Section: Resultsmentioning

confidence: 99%

Section: Automated Assessment Of Transparency: Development and Validamentioning

confidence: 99%

Section: Automated Assessment Of Transparency: Developmentmentioning

confidence: 99%

Section: Competing Interest Declarationmentioning

confidence: 99%

See 3 more Smart Citations

Assessment of transparency indicators across the biomedical literature: how open is open?

Serghiou

Contopoulos‐Ioannidis

Boyack

et al. 2020

Preprint

Self Cite

View full text Add to dashboard Cite

show abstract

Reproducibility of COVID-19 pre-prints

Collins

Alexander

2022

Scientometrics

View full text Add to dashboard Cite

To examine the reproducibility of COVID-19 research, we create a dataset of pre-prints posted to arXiv, bioRxiv, and medRxiv between 28 January 2020 and 30 June 2021 that are related to COVID-19. We extract the text from these pre-prints and parse them looking for keyword markers signaling the availability of the data and code underpinning the pre-print. For the pre-prints that are in our sample, we are unable to find markers of either open data or open code for 75% of those on arXiv, 67% of those on bioRxiv, and 79% of those on medRxiv.

show abstract

Automated screening of COVID-19 preprints: can we help authors to improve transparency and reproducibility?

Weissgerber

Riedel²,

Kilicoglu

et al. 2021

Nat Med

View full text Add to dashboard Cite

ODDPub – a Text-Mining Algorithm to Detect Data Sharing in Biomedical Publications

Cited by 25 publications

References 10 publications

Assessment of transparency indicators across the biomedical literature: how open is open?

Assessment of transparency indicators across the biomedical literature: how open is open?

Reproducibility of COVID-19 pre-prints

Automated screening of COVID-19 preprints: can we help authors to improve transparency and reproducibility?

Contact Info

Product

Resources

About