Proceedings of the 2017 Symposium on Cloud Computing 2017
DOI: 10.1145/3127479.3131624
|View full text |Cite
|
Sign up to set email alerts
|

Automated debugging in data-intensive scalable computing

Abstract: Developing Big Data Analytics workloads often involves trial and error debugging, due to the unclean nature of datasets or wrong assumptions made about data. When errors (e.g., program crash, outlier results, etc.) arise, developers are often interested in identifying a subset of the input data that is able to reproduce the problem. BigSift is a new faulty data localization approach that combines insights from automated fault isolation in software engineering and data provenance in database systems to find a m… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
2

Citation Types

0
14
0

Year Published

2017
2017
2023
2023

Publication Types

Select...
2
2
2

Relationship

2
4

Authors

Journals

citations
Cited by 21 publications
(14 citation statements)
references
References 44 publications
(24 reference statements)
0
14
0
Order By: Relevance
“…This section discusses two examples of Apache Spark applications, inspired by the motivating example presented elsewhere [18], to show the benefit of FLOWDEBUG. FLOWDE-BUG targets commonly used big data analytics running on top of Apache Spark, but its key idea generalizes to any big data analytics running on data intensive scalable computing (DISC) frameworks.…”
Section: Motivating Examplementioning
confidence: 99%
See 4 more Smart Citations
“…This section discusses two examples of Apache Spark applications, inspired by the motivating example presented elsewhere [18], to show the benefit of FLOWDEBUG. FLOWDE-BUG targets commonly used big data analytics running on top of Apache Spark, but its key idea generalizes to any big data analytics running on data intensive scalable computing (DISC) frameworks.…”
Section: Motivating Examplementioning
confidence: 99%
“…An alternative approach would be to isolate a subset of input records contributing to each suspicious output by using search-based debugging [18] or data provenance [25], both of which have limitations related to inefficiency and imprecision, discussed below. Imprecision of Data Provenance.…”
Section: Running Examplementioning
confidence: 99%
See 3 more Smart Citations