Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery &Amp; Data Mining 2020
DOI: 10.1145/3394486.3403205
|View full text |Cite
|
Sign up to set email alerts
|

Vamsa: Automated Provenance Tracking in Data Science Scripts

Help me understand this report
View preprint versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
8
0
1

Year Published

2020
2020
2024
2024

Publication Types

Select...
4
3
2

Relationship

0
9

Authors

Journals

citations
Cited by 33 publications
(11 citation statements)
references
References 16 publications
0
8
0
1
Order By: Relevance
“…Metadata generated by the model during training or testing are extremely important for model debugging. Auxiliary debugging tools [21][22][23] or data provenance studies [30][31][32] extract metadata using a data collection API during training. However, in the federated learning setting, manual extraction and analysis of training metadata are not allowed because of the data protection requirement.…”
Section: Metadata Capturementioning
confidence: 99%
“…Metadata generated by the model during training or testing are extremely important for model debugging. Auxiliary debugging tools [21][22][23] or data provenance studies [30][31][32] extract metadata using a data collection API during training. However, in the federated learning setting, manual extraction and analysis of training metadata are not allowed because of the data protection requirement.…”
Section: Metadata Capturementioning
confidence: 99%
“…By contrast, our approach automatically refreshes dependent cells. Vamsa [6] also employes static dataflow analysis to analyze provenance of Python ML pipelines. Dataflow notebooks [4] extend Jupyter with immutable identifiers for cells and the capability to reference the results of a cell by its identifier.…”
Section: Related Workmentioning
confidence: 99%
“…The interest in workflow provenance management has increased in the recent years, driven by a major effort by the provenance community, 31,35,[50][51][52][53][54][55][56][57][58][59][60][61][62] particularly to explore possibilities of optimizing workflows with the data captured by provenance tools and as a response to the urgent need for reproducible science, which is critical in scientific ML. 63 To exemplify, Thavasimani et al 14 investigate provenance traces recorded during workflow executions to observe differences in results with minor workflow configuration differences.…”
Section: Related Workmentioning
confidence: 99%