2012
DOI: 10.1007/978-3-642-29344-3_37
|View full text |Cite
|
Sign up to set email alerts
|

The Efficiency of MapReduce in Parallel External Memory

Abstract: Since its introduction in 2004, the MapReduce framework has become one of the standard approaches in massive distributed and parallel computation. In contrast to its intensive use in practise, theoretical footing is still limited and only little work has been done yet to put MapReduce on a par with the major computational models. Following pioneer work that relates the MapReduce framework with PRAM and BSP in their macroscopic structure, we focus on the functionality provided by the framework itself, considere… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1

Citation Types

0
3
0

Year Published

2015
2015
2017
2017

Publication Types

Select...
3
2
1

Relationship

0
6

Authors

Journals

citations
Cited by 6 publications
(3 citation statements)
references
References 14 publications
0
3
0
Order By: Relevance
“…Here we generalize this idea to the processing of any conjunctive query in a rigorous way. We should also note that previous work [9] has studied the simulation of MapReduce algorithms on a parallel external memory model.…”
Section: Simulating An Mpc Algorithmmentioning
confidence: 99%
“…Here we generalize this idea to the processing of any conjunctive query in a rigorous way. We should also note that previous work [9] has studied the simulation of MapReduce algorithms on a parallel external memory model.…”
Section: Simulating An Mpc Algorithmmentioning
confidence: 99%
“…Theoretical consideration was given in [23], where the authors present upper and lower bounds on the parallel I/O complexity of the shuffle phase, bounding the worst-case performance loss of the MapReduce approach in terms of I/O-efficiency. Shared environment optimizations for Hadoop MapReduce based on pre-fetching and pre-shuffling were explored in [24].…”
Section: Related Workmentioning
confidence: 99%
“…With respect to data shuffling itself, the problem has been explored from multiple perspectives. Theoretical consideration was given in [11], where the authors present upper and lower bounds on the parallel I/O complexity of the shuffle phase. Low-level optimizations of the networking layer where data shuffling is explored in the context of high performance interconnects such as InfiniBand exist both for MapReduce [12] and Spark [13].…”
Section: Related Workmentioning
confidence: 99%