2019
DOI: 10.1093/gigascience/giz095
|View full text |Cite
|
Sign up to set email alerts
|

Sharing interoperable workflow provenance: A review of best practices and their practical application in CWLProv

Abstract: BackgroundThe automation of data analysis in the form of scientific workflows has become a widely adopted practice in many fields of research. Computationally driven data-intensive experiments using workflows enable automation, scaling, adaptation, and provenance support. However, there are still several challenges associated with the effective sharing, publication, and reproducibility of such workflows due to the incomplete capture of provenance and lack of interoperability between different technical (softwa… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
56
0

Year Published

2019
2019
2023
2023

Publication Types

Select...
5
3
1

Relationship

2
7

Authors

Journals

citations
Cited by 70 publications
(64 citation statements)
references
References 96 publications
0
56
0
Order By: Relevance
“…However, a balance is needed between the detail level and its generation cost. Depending on whether research software is considered as an individual product or as part of an ecosystem, the associated metadata might differ [28,56,57], with workflows having specific mechanisms to capture it through their specifications, e.g., using Common Workflow Language (CWL) [58,59] and/or Workflow Description Language (WDL) [60], among others. This metadata should include software version, dependencies (including which version), input and output data types and formats (preferably using a controlled vocabulary), communication interfaces (specified using standards like OpenAPI), and/or deployment options.…”
Section: Interoperabilitymentioning
confidence: 99%
“…However, a balance is needed between the detail level and its generation cost. Depending on whether research software is considered as an individual product or as part of an ecosystem, the associated metadata might differ [28,56,57], with workflows having specific mechanisms to capture it through their specifications, e.g., using Common Workflow Language (CWL) [58,59] and/or Workflow Description Language (WDL) [60], among others. This metadata should include software version, dependencies (including which version), input and output data types and formats (preferably using a controlled vocabulary), communication interfaces (specified using standards like OpenAPI), and/or deployment options.…”
Section: Interoperabilitymentioning
confidence: 99%
“…Other software such as pepkit [18], basejump [19], Refgenie [15], CRAM [10], refget [11], and CWLProv [22] are not particularly designed for RNA-seq data import, and so are less directly comparable to tximeta. Pepkit, basejump, Refgenie, and CWLProv are generic workflow or resource management tools, some of which allow for the possibility of post hoc identification of annotation metadata.…”
Section: Comparison To Related Softwarementioning
confidence: 99%
“…Belhajjame et al [20] summarized literature in the field of computational reproducibility and efforts toward extensive provenance tracking. The developers of the Common Workflow Language (CWL) [21] have defined a profile, CWLProv, for recording provenance through a workflow run, and have a number of implementations, including within cwltool [22]. The developers of CWLProv emphasized the importance of tracking versions of input data, such as reference genomes or variant databases in a scientific workflow, and they suggested to use and store stable identifiers of all data and software, as well as the workflow itself.…”
Section: Introductionmentioning
confidence: 99%
“…Prospective provenance refers to the specifications or ''recipes'' that describe the workflow steps and their execution order, typically as an abstract representation of these steps (protocols), as well as expected inputs and outputs (Cohen-Boulakia et al, 2017). Retrospective provenance refers to the information about actual workflow executions that happened in the past, including the concrete activities that consumed inputs and produced outputs, as well as information about the execution environment (Khan et al, 2019). Workflow evolution provenance refers to tracking the versions of workflow specifications and the respective data, as the workflow specification is changed and improved over time.…”
Section: Workflow Systemsmentioning
confidence: 99%