2021
DOI: 10.5334/dsj-2021-012
|View full text |Cite
|
Sign up to set email alerts
|

Versioning Data Is About More than Revisions: A Conceptual Framework and Proposed Principles

Abstract: A dataset, small or big, is often changed to correct errors, apply new algorithms, or add new data (e.g., as part of a time series), etc. In addition, datasets might be bundled into collections, distributed in different encodings or mirrored onto different platforms. All these differences between versions of datasets need to be understood by researchers who want to cite the exact version of the dataset that was used to underpin their research. Failing to do so reduces the reproducibility of research results. A… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

1
4
0

Year Published

2022
2022
2025
2025

Publication Types

Select...
6
3
1

Relationship

0
10

Authors

Journals

citations
Cited by 11 publications
(5 citation statements)
references
References 18 publications
1
4
0
Order By: Relevance
“…This versioning approach is consistent with recommendations from the Research Data Alliance (RDA) Data Citation Working Group (Rauber et al, 2015) and the RDA Data Versioning Working Group (Principles 1, 5, and 6 of Klump et al, 2021). As new standards and best practices emerge from the research data community, ONC will continue to improve these frameworks.…”
Section: Data Versioningsupporting
confidence: 56%
“…This versioning approach is consistent with recommendations from the Research Data Alliance (RDA) Data Citation Working Group (Rauber et al, 2015) and the RDA Data Versioning Working Group (Principles 1, 5, and 6 of Klump et al, 2021). As new standards and best practices emerge from the research data community, ONC will continue to improve these frameworks.…”
Section: Data Versioningsupporting
confidence: 56%
“…It is a communication protocol with many benefits, including software reuse, information extraction, data summarization, etc. Another noteworthy task is data version control [17], which records the changes to a dataset during its lifecycle to achieve traceable error reporting and enable reproducible results.…”
Section: Data Maintenancementioning
confidence: 99%
“…Despite clear mutability, metadata tables are often treated as static. Version control is well established for code and has a diverse and multifaceted history for data as well ( Klump et al, 2021 ), but the question of versioning metadata specifically is distinct. A common strategy for versioning sample metadata is to use tools built for code versioning, such as git.…”
Section: Challenges To Sharing Genomic Metadatamentioning
confidence: 99%