2010 IEEE 26th International Conference on Data Engineering (ICDE 2010) 2010
DOI: 10.1109/icde.2010.5447919
|View full text |Cite
|
Sign up to set email alerts
|

Estimating the progress of MapReduce pipelines

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
79
0
1

Year Published

2012
2012
2019
2019

Publication Types

Select...
5
3
1

Relationship

0
9

Authors

Journals

citations
Cited by 93 publications
(81 citation statements)
references
References 9 publications
0
79
0
1
Order By: Relevance
“…The elapsed time of the running task can be evaluated through its actual execution speed to provide more accurate estimate. Existing methods break map or reduce task into pipelines and sum the elapsed time of every pipeline as the task estimate [17] [12]. In our experiment, we adopts the finish time estimate method of [17], which estimates the time left for a task based on the progress score provided by Hadoop, as (1 − ProgreeScore)/ProgressRate, and the ProgressRate=ProgressScore/elapsed time t.…”
Section: Estimating the Progressmentioning
confidence: 99%
See 1 more Smart Citation
“…The elapsed time of the running task can be evaluated through its actual execution speed to provide more accurate estimate. Existing methods break map or reduce task into pipelines and sum the elapsed time of every pipeline as the task estimate [17] [12]. In our experiment, we adopts the finish time estimate method of [17], which estimates the time left for a task based on the progress score provided by Hadoop, as (1 − ProgreeScore)/ProgressRate, and the ProgressRate=ProgressScore/elapsed time t.…”
Section: Estimating the Progressmentioning
confidence: 99%
“…[17] provides a method to estimate the progress of a MapReduce task, however, there are also several challenges to estimate the progress of MapReduce jobs and MapReduce DAGs. Parallax [12] estimates the progress of queries translated into sequences of MapReduce jobs. It breaks a MapReduce job into pipelines, which are groups of interconnected operators that execute simultaneously.…”
Section: Related Workmentioning
confidence: 99%
“…While there are several frameworks that generate pipeline MapReduce applications, few works focus on optimizing the actual execution of this type of applications. In [11], the authors propose a tool for estimating the progress of MapReduce pipelines generated by Pig queries. The Hadoop Online Prototype (HOP) [7] is a modified version of the Hadoop MapReduce framework that supports online aggregation, allowing users to get snapshots from a job as it is being computed.…”
Section: Pipeline Mapreduce Applications: Overview and Related Workmentioning
confidence: 99%
“…However it uses high resource when constructing virtual machines and results in wasting allocated resources when they are not activated. The cloud platform gives a way when startups select the platforms to deploy their development and operational environment [1], [2] When prototyping a distributed application like MapReduce, a developer needs to ensure that the application execution corresponds to the specification while its performance is not impacted by the number of nodes or by some failure scenarios [1]- [3]. Indeed, MapReduce relies on successive computing-communication steps that, if not coordinated with care, lead to performance bottlenecks and a poor scalability.…”
Section: Introductionmentioning
confidence: 99%