2018
DOI: 10.14778/3199517.3199522
|View full text |Cite
|
Sign up to set email alerts
|

Smoke

Abstract: Data lineage describes the relationship between individual input and output data items of a workflow and is an integral ingredient for both traditional (e.g., debugging or auditing) and emergent (e.g., explanations or cleaning) applications. The core, long-standing problem that lineage systems need to address---and the main focus of this paper---is to quickly capture lineage across a workflow in order to speed up future queries over lineage. Current lineage systems, however, either incur high lineage capture o… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1

Citation Types

0
8
0

Year Published

2018
2018
2022
2022

Publication Types

Select...
4
2
1
1

Relationship

1
7

Authors

Journals

citations
Cited by 19 publications
(8 citation statements)
references
References 46 publications
0
8
0
Order By: Relevance
“…, d in ) where i is the unique index of the row and each element d ij (for 1 ≤ j ≤ n) is either a value in the domain of the feature a j or the special symbol ⊥, denoting a missing value. Row indexes can be implemented in different ways (e.g., with RID annotations [15]). We only assume here that a row of any dataset can be uniquely identified.…”
Section: Data Modelmentioning
confidence: 99%
See 1 more Smart Citation
“…, d in ) where i is the unique index of the row and each element d ij (for 1 ≤ j ≤ n) is either a value in the domain of the feature a j or the special symbol ⊥, denoting a missing value. Row indexes can be implemented in different ways (e.g., with RID annotations [15]). We only assume here that a row of any dataset can be uniquely identified.…”
Section: Data Modelmentioning
confidence: 99%
“…one-hot and other kinds of categorical data encodings. We also note that tools that operate on a database back-end, like GProm [44], Smoke [15] and older ones like Post-it [47] for provenance capture cannot be used in our setting. Interestingly, extensions to the polynomials approach have been proposed to describe the provenance of certain linear algebra operations, such as matrix decomposition and tensor-product construction [48].…”
Section: Related Workmentioning
confidence: 99%
“…RAMP [23] and Newt [30] added data provenance support to DISC systems; both are capable of performing backward tracing of faults to failure-inducing inputs. Wu et al design a new database engine, Smoke, that incorporates lineage logic within the dataflow operators and constructs a lineage query as the database query is being developed [36]. Ikeda et al present provenance properties such as minimality and precision for individual transformation operators to support data provenance [22,24].…”
Section: Related Workmentioning
confidence: 99%
“…Furthermore, as summarized data provide an abstract or aggregate view, there is a need for data transparency, meaning that experts should be able to trace individual data points, which contributed to the aggregate summary. This involves incorporating ideas from provenance systems such as Smoke [182] and Scorpion [183], which provide fast data lineage tracking. Finally, for each application, empirical studies are needed to see what and how information should be presented or summarized because too much transparency can overwhelm and negatively impact the expert [13].…”
Section: Human Decisionsmentioning
confidence: 99%