2023
DOI: 10.1109/tpds.2022.3220539
|View full text |Cite
|
Sign up to set email alerts
|

Building Trust in Earth Science Findings through Data Traceability and Results Explainability

Abstract: To trust findings in computational science, scientists need workflows that trace the data provenance and support results explainability. As workflows become more complex, tracing data provenance and explaining results become harder to achieve. In this paper, we propose a computational environment that automatically creates a workflow execution's record trail and invisibly attaches it to the workflow's output, enabling data traceability and results explainability. Our solution transforms existing container tech… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
5
0

Year Published

2023
2023
2024
2024

Publication Types

Select...
5
2
1

Relationship

1
7

Authors

Journals

citations
Cited by 11 publications
(5 citation statements)
references
References 31 publications
0
5
0
Order By: Relevance
“…For example, code typically depends on programming language and operating system versions, and system-level library code (as in Figure 1). Recently, some have sought to solve these broader dependency issues using virtualization, a well-studied software engineering solution to dependency problems (Nüst et al 2020;Olaya et al 2020). Virtualization encapsulates code and all of its dependencies into a virtual computing environment that can be easily disseminated.…”
Section: Containerizing Analysesmentioning
confidence: 99%
See 1 more Smart Citation
“…For example, code typically depends on programming language and operating system versions, and system-level library code (as in Figure 1). Recently, some have sought to solve these broader dependency issues using virtualization, a well-studied software engineering solution to dependency problems (Nüst et al 2020;Olaya et al 2020). Virtualization encapsulates code and all of its dependencies into a virtual computing environment that can be easily disseminated.…”
Section: Containerizing Analysesmentioning
confidence: 99%
“…Containerization has been an increasingly adopted tool for reproducibility widely across the scientific community including areas such as geography, psychology, environmental science, metagenomics and many others (Knoth and Nüst 2017;Wiebels and Moreau 2021;Essawy et al 2020;Visconti et al 2018;Nüst et al 2020;Olaya et al 2020). To set the stage for a review of containerization technology we will first illustrate how containerization is used in practice.…”
Section: Containerization In Practicementioning
confidence: 99%
“…The use of containers is common in the area of HPC. For example, in Olaya et al (2022), the authors present an environment based on fine-grained containerization of both data and applications which automatically creates data lineage and record trail of workflow executions, enabling traceability of data and explainability of results. However, very few approaches to automating its deployment can be found in the literature.…”
Section: State Of the Artmentioning
confidence: 99%
“…There is a growing need for developing persistent scientific workflows to seamlessly connect and integrate software stacks and data services across cloud platforms supported by virtualization and data provenance (Bhatia et al, 2021). Containerization of scientific workflow enables reusability, portability, and reproducibility of results (Olaya et al, 2023) and ease of system maintenance efforts (Dusia et al, 2015); McDaniel et al, 2015); Monsalve et al, 2015); Herbein et al, 2016). Future directions point to the need for automatic containerization of complete working environments that include software dependencies (e.g., Python programs/modules) at all stages and the components of the Dask cluster, which are currently running natively in our workflow to maximize performance.…”
Section: Aspects Of Novelty and Future Directionsmentioning
confidence: 99%