2012
DOI: 10.1002/cpe.2895
|View full text |Cite
|
Sign up to set email alerts
|

A study on using uncertain time series matching algorithms for MapReduce applications

Abstract: SUMMARY In this paper, we study CPU utilization time patterns of several MapReduce applications. After extracting running patterns of several applications, the patterns along with their statistical information are saved in a reference database to be later used to tweak system parameters to efficiently execute future unknown applications. To achieve this goal, CPU utilization patterns of new applications along with its statistical information are compared with the already known ones in the reference database to… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
15
0

Year Published

2012
2012
2017
2017

Publication Types

Select...
4
3
2

Relationship

2
7

Authors

Journals

citations
Cited by 19 publications
(15 citation statements)
references
References 21 publications
0
15
0
Order By: Relevance
“…MapReduce is a programming model for processing large datasets with a parallel, distributed algorithm on a computing cluster of low cost commodity computers. A MapReduce application typically consists of two phases (or operations); "map" and "reduce" with many tasks in each phase [83]. A map/reduce task deals with a chunk of data independently and thus, tasks in a given phase can be easily parallelized and effectively processed in a large-scale computing environment (i.e., a cloud platform).…”
Section: Mapreduce Parallel Processingmentioning
confidence: 99%
“…MapReduce is a programming model for processing large datasets with a parallel, distributed algorithm on a computing cluster of low cost commodity computers. A MapReduce application typically consists of two phases (or operations); "map" and "reduce" with many tasks in each phase [83]. A map/reduce task deals with a chunk of data independently and thus, tasks in a given phase can be easily parallelized and effectively processed in a large-scale computing environment (i.e., a cloud platform).…”
Section: Mapreduce Parallel Processingmentioning
confidence: 99%
“…Similarly, (Palanisamy et al, 2015) deals with optimizing the allocation of VMs executing MapReduce jobs in order to minimize the infrastructure cost in a cloud datacenter. In the same single-cloud scenario, the work (Rizvandi et al, 2013) focuses on the automatic estimation of MapReduce configuration parameters, while (Verma et al, 2011) proposes a resource allocation algorithm able to estimate the amount of resources required to meet MapReduce-specific performance goals. However, these models were not intended to address the challenges of the hybrid cloud scenario, which is a possible target environment for the provisioning of additional VMs in our system thanks to the underlying HyIaaS layer.…”
Section: Related Workmentioning
confidence: 99%
“…Then a fingerprint based method is utilized to predict the performance of a new MapReduce application based on the studied applications. The idea of pattern matching was used in [12] to find the similarity between CPU time patters of a new application and applications in database. Then it was concluded that if two applications show high similarity for several setting of configuration parameters it is very likely their optimal values of configuration parameters also be the same.…”
Section: Related Workmentioning
confidence: 99%
“…Our benchmark applications are WordCount (used by leading researchers in Intel [21], IBM [6], MIT [22], and UC-Berkeley [7]), TeraSort (as a standard benchmark in the international TeraByte sort competition [23,24] as well as many researchers in IBM [25,26], Intel [21], INRIA [27] and UC-Berkeley [28]), and Exim Mainlog parsing [12,29]. These benchmarks are used due to their striking differences as well as their popularity among MapReduce applications.…”
Section: Experimental Settingmentioning
confidence: 99%