2019
DOI: 10.1007/s13222-019-00317-8
|View full text |Cite
|
Sign up to set email alerts
|

A Link is not Enough – Reproducibility of Data

Abstract: Although many works in the database community use open data in their experimental evaluation, repeating the empirical results of previous works remains a challenge. This holds true even if the source code or binaries of the tested algorithms are available. In this paper, we argue that providing access to the raw, original datasets is not enough. Real-world datasets are rarely processed without modification. Instead, the data is adapted to the needs of the experimental evaluation in the data preparation process… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

1
18
0

Year Published

2019
2019
2024
2024

Publication Types

Select...
5
2
1

Relationship

0
8

Authors

Journals

citations
Cited by 16 publications
(19 citation statements)
references
References 18 publications
1
18
0
Order By: Relevance
“…We assign meaning to the term as data as an output (result) in a computational context, which was generated when reproducing computational experiments. Although the term “computationally reproducible data” is not officially defined, other sources and studies have referred to the concept of data that contributes to computational reproducibility ( Baranyi and Greilhuber, 1999 ; Weinländer et al, 2009 ; de Ruiter, 2017 ; Perkel, 2017 ; Tait, 2017 ; Pawlik et al, 2019 ). From the unsure responses ( n = 30), we categorised those that gave free-text responses (70%, n = 21, see Supplementary section 2, free responses) into whether they did actually understand the term, those that did not understand the term, and those that did not give any free text.…”
Section: Resultsmentioning
confidence: 99%
“…We assign meaning to the term as data as an output (result) in a computational context, which was generated when reproducing computational experiments. Although the term “computationally reproducible data” is not officially defined, other sources and studies have referred to the concept of data that contributes to computational reproducibility ( Baranyi and Greilhuber, 1999 ; Weinländer et al, 2009 ; de Ruiter, 2017 ; Perkel, 2017 ; Tait, 2017 ; Pawlik et al, 2019 ). From the unsure responses ( n = 30), we categorised those that gave free-text responses (70%, n = 21, see Supplementary section 2, free responses) into whether they did actually understand the term, those that did not understand the term, and those that did not give any free text.…”
Section: Resultsmentioning
confidence: 99%
“…Even with current policies mandating data openness [59,60], authors still fail to include their data alongside their publication, and this can not only be attributed to technical complications, but also fear of being scooped, fear of mistakes being found in data or analyses, and fear of others using their data for their own research papers [64,73]. Making FAIR data practices standard, through public data deposition and subsequent publication and citation, could encourage individual researchers and communities to share and reus e data considering their individual requirements and needs [68]. Data accessibility issues are also compounded by data becoming less retrievable with every year passing after the publication [85].…”
Section: Discussionmentioning
confidence: 99%
“…This raises complex questions around large data generation projects that also need to be studied extensively for future impact, especially with respect to reproducibility within publications. Moreover, access to the raw data might not be enough, if the steps and other artefacts involved in producing the processed data that was used in the analysis are not provided [68]. In addition, corresponding authors often move on from projects and institutions or the authors themselves can no longer access the data, meaning "data available on request" ceases to be a viable option to source data or explanations of methods.…”
Section: Discussionmentioning
confidence: 99%
See 1 more Smart Citation
“…But why is it so important for reproducibility and archival access to access materials in the original computational environment? Given how much modern research practices rely on unique toolkits, the output from any analysis is highly dependent on the actual software in which the research happens (Pawlik et al, 2019). Results of research change depending on versions of software, on operating system changes, and other custom configurations (Gronenschild et al, 2012).…”
Section: Introductionmentioning
confidence: 99%