2005
DOI: 10.1145/1084805.1084812
|View full text |Cite
|
Sign up to set email alerts
|

A survey of data provenance in e-science

Abstract: Data management is growing in complexity as largescale applications take advantage of the loosely coupled resources brought together by grid middleware and by abundant storage capacity. Metadata describing the data products used in and generated by these applications is essential to disambiguate the data and enable reuse. Data provenance, one kind of metadata, pertains to the derivation history of a data product starting from its original sources. In this paper we create a taxonomy of data provenance character… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
620
0
14

Year Published

2006
2006
2021
2021

Publication Types

Select...
4
3
2

Relationship

0
9

Authors

Journals

citations
Cited by 956 publications
(658 citation statements)
references
References 18 publications
0
620
0
14
Order By: Relevance
“…The final phase of the workflow lifecycle, namely analysis of the results, is increasingly perceived as of great importance within the e-science community [15]. The provenance of a piece of data produced by an arbitrary process is a complete account of how that piece of data was computed, starting from user input and taking into account intermediate results produced by the processors involved in the computation.…”
Section: Workflow Results Analysismentioning
confidence: 99%
“…The final phase of the workflow lifecycle, namely analysis of the results, is increasingly perceived as of great importance within the e-science community [15]. The provenance of a piece of data produced by an arbitrary process is a complete account of how that piece of data was computed, starting from user input and taking into account intermediate results produced by the processors involved in the computation.…”
Section: Workflow Results Analysismentioning
confidence: 99%
“…For a comprehensive overview of the field, we refer the reader to Moreau [24]. Furthermore, Cheney et al [5] and Simmhan et al [30] provide specialized reviews for databases and e-science respectively. Here, we focus on systems for provenance capture.…”
Section: Capturing Provenancementioning
confidence: 99%
“…One form of provenance is "workflow" or "coarse-grained" provenance: information describing how derived data has been calculated from raw observations [3,10,14,21]. Workflow provenance is important in scientific computation, but is not a major concern in curated databases.…”
Section: The Problemmentioning
confidence: 99%
“…As mentioned in the introduction, "workflow" or "coarse-grained" provenance has been studied extensively in the context of scientific computation [10,14,26]; Bose and Frew [3] and Simmhan et al [21] survey most existing research on such systems. These approaches record the process used to derive processed data products from raw data.…”
Section: Related Workmentioning
confidence: 99%