2020
DOI: 10.5334/dsj-2020-042
|View full text |Cite
|
Sign up to set email alerts
|

ODDPub – a Text-Mining Algorithm to Detect Data Sharing in Biomedical Publications

Abstract: Open research data are increasingly recognized as a quality indicator and an important resource to increase transparency, robustness and collaboration in science. However, no standardized way of reporting Open Data in publications exists, making it difficult to find shared datasets and assess the prevalence of Open Data in an automated fashion.We developed ODDPub (Open Data Detection in Publications), a text-mining algorithm that screens biomedical publications and detects cases of Open Data. Using English-lan… Show more

Help me understand this report
View preprint versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

0
38
1
1

Year Published

2020
2020
2023
2023

Publication Types

Select...
3
2

Relationship

1
4

Authors

Journals

citations
Cited by 25 publications
(46 citation statements)
references
References 10 publications
0
38
1
1
Order By: Relevance
“…We proceeded to test the performance of algorithms automating the process of identifying indicators of transparency. Three were developed from scratch (COI disclosure, Funding disclosure and Protocol registration) and two were adopted from an already existing library [28] (Data and Code sharing) and customised to enhance efficiency (see Materials and Methods). Note that, even though in the manual assessment, Data sharing and Code sharing indicators capture any statement about data or code availability, in the automated assessment we only capture newly generated open raw data or code.…”
Section: Resultsmentioning
confidence: 99%
See 4 more Smart Citations
“…We proceeded to test the performance of algorithms automating the process of identifying indicators of transparency. Three were developed from scratch (COI disclosure, Funding disclosure and Protocol registration) and two were adopted from an already existing library [28] (Data and Code sharing) and customised to enhance efficiency (see Materials and Methods). Note that, even though in the manual assessment, Data sharing and Code sharing indicators capture any statement about data or code availability, in the automated assessment we only capture newly generated open raw data or code.…”
Section: Resultsmentioning
confidence: 99%
“…However, with the vast majority of articles not sharing data or code (5197/6017), this 1 article was dramatically overweighted. This is reflected by the large confidence interval (34-94%), which includes the estimated sensitivity of 73% from a random sample of 800 PMC research articles of 2018 calculated by the original authors of this algorithm [29].…”
Section: Resultsmentioning
confidence: 99%
See 3 more Smart Citations