Proceedings of the 2019 27th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of 2019
DOI: 10.1145/3338906.3338953
|View full text |Cite
|
Sign up to set email alerts
|

White-box testing of big data analytics with complex user-defined functions

Abstract: Data-intensive scalable computing (DISC) systems such as Google's MapReduce, Apache Hadoop, and Apache Spark are being leveraged to process massive quantities of data in the cloud. Modern DISC applications pose new challenges in exhaustive, automatic testing because they consist of dataflow operators, and complex user-defined functions (UDF) are prevalent unlike SQL queries. We design a new white-box testing approach, called BigTest to reason about the internal semantics of UDFs in tandem with the equivalence … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
4
1

Citation Types

0
8
0

Year Published

2021
2021
2023
2023

Publication Types

Select...
5
1

Relationship

1
5

Authors

Journals

citations
Cited by 22 publications
(8 citation statements)
references
References 37 publications
(39 reference statements)
0
8
0
Order By: Relevance
“…JDU path coverage. JDU (Joint Dataflow and UDF) path coverage is introduced by Gulzar et al [13], which consider the paths thoroughly along with the operators and the internal paths in their UDFs. Their experimental results demonstrate that JDU path coverage is directly related to improvement in fault detection.…”
Section: Dsp and Apache Flinkmentioning
confidence: 99%
See 4 more Smart Citations
“…JDU path coverage. JDU (Joint Dataflow and UDF) path coverage is introduced by Gulzar et al [13], which consider the paths thoroughly along with the operators and the internal paths in their UDFs. Their experimental results demonstrate that JDU path coverage is directly related to improvement in fault detection.…”
Section: Dsp and Apache Flinkmentioning
confidence: 99%
“…As discussed in [13], the common industry practice to test big data programs is running locally with randomly sampled data. An empirical study presented by Vianna et al [11] demonstrates that difficulties in generating test data are one of the most frequent problems when designing DSP programs.…”
Section: Introductionmentioning
confidence: 99%
See 3 more Smart Citations