2013 IEEE 29th International Conference on Data Engineering (ICDE) 2013
DOI: 10.1109/icde.2013.6544881
|View full text |Cite
|
Sign up to set email alerts
|

SubZero: A fine-grained lineage system for scientific databases

Abstract: Abstract-Data lineage is a key component of provenance that helps scientists track and query relationships between input and output data. While current systems readily support lineage relationships at the file or data array level, finer-grained support at an array-cell level is impractical due to the lack of support for user defined operators and the high runtime and storage overhead to store such lineage.We interviewed scientists in several domains to identify a set of common semantics that can be leveraged t… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
37
0

Year Published

2013
2013
2022
2022

Publication Types

Select...
4
4
2

Relationship

0
10

Authors

Journals

citations
Cited by 30 publications
(37 citation statements)
references
References 12 publications
0
37
0
Order By: Relevance
“…There exists a line of research in providing fine-grain data lineage for database systems. Trio [2] and SubZero [25] introduce new features to manage fine-grain lineage along with data. They track provenance by transforming/reversing queries.…”
Section: Related Workmentioning
confidence: 99%
“…There exists a line of research in providing fine-grain data lineage for database systems. Trio [2] and SubZero [25] introduce new features to manage fine-grain lineage along with data. They track provenance by transforming/reversing queries.…”
Section: Related Workmentioning
confidence: 99%
“…Data provenance concerns with the problem of detecting the origin, the creation and the propagation process of data within a data-intensive system. In other words, data provenance consists in the lineage (e.g., [25]) and derivation (e.g., [21]) of data and data objects, and it puts its conceptual roots in extensively studies performed in the past in the contexts of arts, literary works, manuscripts, sculptures, and so forth. Another concept that is close to the "data provenance" one is represented by the so-called ownership of data (e.g., [20]), which refers to the issue of defining and providing information about the rightful owner of data assets, and to the acquisition, use and distribution policy implemented by the data owner.…”
Section: Introductionmentioning
confidence: 99%
“…Hence, whenever a result is calculated, a user should be able to trace the derivation of data, if he believes the result to be suspect. We have built an elaborate system that does exactly that, exploiting the semantics of relational and array operators to be able to efficiently work backwards, using a notion of fine-grained provenance [7]. We are also currently investigating visualization and other tools to help users understand data quality [20].…”
Section: Seamless On-line Reprovisioningmentioning
confidence: 99%