2017
DOI: 10.1371/journal.pbio.2002477
|View full text |Cite
|
Sign up to set email alerts
|

Wide-Open: Accelerating public data release by automating detection of overdue datasets

Abstract: Open data is a vital pillar of open science and a key enabler for reproducibility, data reuse, and novel discoveries. Enforcement of open-data policies, however, largely relies on manual efforts, which invariably lag behind the increasingly automated generation of biological data. To address this problem, we developed a general approach to automatically identify datasets overdue for public release by applying text mining to identify dataset references in published articles and parse query results from reposito… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

0
18
0

Year Published

2017
2017
2023
2023

Publication Types

Select...
7

Relationship

0
7

Authors

Journals

citations
Cited by 13 publications
(18 citation statements)
references
References 6 publications
0
18
0
Order By: Relevance
“…Novel technologies have led to an increase in the volume and diversity of large-scale public data ( 43 ), which are a vital pillar of open science and a key enabler of reproducibility and novel discoveries ( 44 ). Reuse of public data may potentially answer questions beyond those originally envisioned ( 45 ), and provide a systems-level approach to predicting treatment response and disease progression, and to developing precision therapies ( 43 , 46 ).…”
Section: Discussionmentioning
confidence: 99%
“…Novel technologies have led to an increase in the volume and diversity of large-scale public data ( 43 ), which are a vital pillar of open science and a key enabler of reproducibility and novel discoveries ( 44 ). Reuse of public data may potentially answer questions beyond those originally envisioned ( 45 ), and provide a systems-level approach to predicting treatment response and disease progression, and to developing precision therapies ( 43 , 46 ).…”
Section: Discussionmentioning
confidence: 99%
“…Drivers Inhibitors Arza and Fressoli [4] Systematizing benefits of open science practices X X X Arzberger, Schroeder [50] Promoting access to public research data for scientific, economic, and social development X X X X Bezuidenhout [51] Technology Transfer and True Transformation: Implications for Open Data X X Campbell [2] Access to scientific data in the 21st century: Rationale and illustrative usage rights review X X X X da Costa and Leite [47] Factors influencing research data communication on Zika virus: a grounded theory X X X X Cragin, Palmer [52] Data sharing, small science and institutional repositories X X Curty, Crowston [40] Attitudes and norms affecting scientists' data reuse X X X Enke, Thessen [10] The user's view on biodiversity data sharing-Investigating facts of acceptance and requirements to realize a sustainable use of research data X X X X Fecher, Friesike [11] What drives academic data sharing? X X X Ganzevoort, van den Born [53] Sharing biodiversity data: citizen scientists' concerns and motivations X X Grechkin, Poon [6] Wide-Open: Accelerating public data release by automating detection of overdue datasets X X X…”
Section: Drivers Inhibitorsmentioning
confidence: 99%
“…Social responsiveness [4] and standard social norms [41] The culture of open sharing (promotion for academe is tied to publication and not data) [49] Perceived social pressure to share data with others [45] Code of conduct and related normative standards of professional scientists and their communities [50] Subjective norm [41] Perceived normative pressure [42] Peer pressure to share data [8] Attitudes toward data sharing [17,42] World-wide attention to the need to share and preserve data [56] Effort The expectation that data will be reused [40] (Perceived) effort [11,41,42,47,49] Avoidance of duplication of work [2,41,48,57] Required manual efforts [6] Increase efficient use of funding and population resources by avoiding duplicate data collection [8,9] Individual investment needed to preserve and manage data [57] Efficient and optimized use of resources [1,48,56] Time investment (the amount of time they would have to invest to get the data ready to share) [8,10,11,47,49,52] A source for researchers to consult when considering how to build upon existing studies [42] Large amount of work [52] Saving time involved in data collection [41,48] Making data from the long tail discoverable and reusable is emerging as a major challenge …”
Section: Themesmentioning
confidence: 99%
See 1 more Smart Citation
“…The challenge discussed here exists within the larger context of improving the timely public release of all published data. Recently the development of Wide-Open, a programmatic approach using text mining to detect published but unreleased data was described (Grechkin et al 2017). The first run of this approach focused on records in the Gene Expression Omnibus (GEO) repository and the Sequence Read Archive (SRA) at NCBI.…”
Section: Discussionmentioning
confidence: 99%