2013 IEEE International Symposium on Workload Characterization (IISWC) 2013
DOI: 10.1109/iiswc.2013.6704673
|View full text |Cite
|
Sign up to set email alerts
|

Semantic characterization of MapReduce workloads

Abstract: Abstract-MapReduce is a platform for analyzing large amounts of data on clusters of commodity machines. MapReduce is popular, in part thanks to its apparent simplicity. However, there are unstated requirements for the semantics of MapReduce applications that can affect their correctness and performance. MapReduce implementations do not check whether user code satisfies these requirements, leading to time-consuming debugging sessions, performance problems, and, worst of all, silently corrupt results. This paper… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
4
1

Citation Types

0
9
0

Year Published

2013
2013
2020
2020

Publication Types

Select...
3
2
1

Relationship

2
4

Authors

Journals

citations
Cited by 10 publications
(9 citation statements)
references
References 24 publications
0
9
0
Order By: Relevance
“…Our study shows that certain data properties (functional dependencies) in the data can render non-commutative reducers harmless, making a case for checking data properties. In addition, 11 out of 23 reducers in [13] are non-commutative and exhibit the StrConcat pattern according to their study. The distribution of non-commutativity patterns is not consistent with our results, probably because open-sourced MapReduce programs in their study process simpler data with fewer cross-column relations than our samples from the production environment as discussed in Section 3.2.…”
Section: Related Workmentioning
confidence: 94%
See 1 more Smart Citation
“…Our study shows that certain data properties (functional dependencies) in the data can render non-commutative reducers harmless, making a case for checking data properties. In addition, 11 out of 23 reducers in [13] are non-commutative and exhibit the StrConcat pattern according to their study. The distribution of non-commutativity patterns is not consistent with our results, probably because open-sourced MapReduce programs in their study process simpler data with fewer cross-column relations than our samples from the production environment as discussed in Section 3.2.…”
Section: Related Workmentioning
confidence: 94%
“…Csallner et al [3] proposed a white-box symbolic execution approach to test commutativity. Xu et al [13] studied 23 reducers in open-sourced MapReduce programs and used a blackbox approach to test commutativity. Both of them check code and generate different input sequences with the same data set, on which the target reducer outputs different results.…”
Section: Related Workmentioning
confidence: 99%
“…In follow-up work, we adapted our approach to MapReduce, and applied it to characterize semantic properties of MapReduce workloads [25]. As future work, we intend to consider the next stages of dataflow program validation, in which testing approaches analogous to integration and system testing are needed.…”
Section: Discussionmentioning
confidence: 99%
“…Verifying the correctness of a MR program involves checking the commutativity and associativity of the reduce function. Xu et al propose various semantic criteria to model commonly held assumptions on MR programs [29], including determinism, partition isolation, commutativity, and associativity of map/reduce combinators. Their empirical survey shows that these criteria are often overlooked by programmers and violated in practice.…”
Section: Related Workmentioning
confidence: 99%