2012 41st International Conference on Parallel Processing Workshops 2012
DOI: 10.1109/icppw.2012.57
|View full text |Cite
|
Sign up to set email alerts
|

Characterizing Machines and Workloads on a Google Cluster

Abstract: Cloud computing offers high scalability, flexibility and cost-effectiveness to meet emerging computing requirements. Understanding the characteristics of real workloads on a large production cloud cluster benefits not only cloud service providers but also researchers and daily users. This paper studies a largescale Google cluster usage trace dataset and characterizes how the machines in the cluster are managed and the workloads submitted during a 29-day period behave. We focus on the frequency and pattern of m… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
71
1
1

Year Published

2013
2013
2022
2022

Publication Types

Select...
6
2
1

Relationship

0
9

Authors

Journals

citations
Cited by 129 publications
(73 citation statements)
references
References 9 publications
0
71
1
1
Order By: Relevance
“…There has been some limited uptake in analyzing these trace logs, with work focusing on different research objectives including job behavior [6], statistical properties of workload [7][8], as well as machine events and job behavior [9]. However, as stated in [9], due to the massive dataset sizes as well as the required computation and storage power necessary to perform comprehensive analysis, until now it has only been possible to perform analysis at a coarse-grain or perform an in-depth analysis on a small time frame that represents a fraction of the entire trace log.…”
Section: Introductionmentioning
confidence: 99%
“…There has been some limited uptake in analyzing these trace logs, with work focusing on different research objectives including job behavior [6], statistical properties of workload [7][8], as well as machine events and job behavior [9]. However, as stated in [9], due to the massive dataset sizes as well as the required computation and storage power necessary to perform comprehensive analysis, until now it has only been possible to perform analysis at a coarse-grain or perform an in-depth analysis on a small time frame that represents a fraction of the entire trace log.…”
Section: Introductionmentioning
confidence: 99%
“…Liu and Cho reported in their paper Characterizing machines and workloads on a Google cluster [41] that the majority (93%) of the machines monitored in the Google cluster dataset have a capacity set to 0.5, which supports AGILE's argument for setting overload at a capacity of 0.7 and higher.…”
Section: Setupmentioning
confidence: 86%
“…Data was recorded every 5 minutes (300 seconds) over a period of 29 days. According to Liu and Cho [41] the dataset has been sanitised to obfuscate condential information, but still gives useful and accurate information on cluster usage and load. This is important for the evaluations performed in this thesis.…”
Section: Datasetsmentioning
confidence: 99%
“…Several recent comprehensive analyses (e.g., [28,23]) of the workload characteristics derived from Google cloud tracelogs, featuring over 900 users submitting approximately 25 million tasks over a month, yielded significant data on the characteristics of submitted workloads and the management of cluster machines. These studies enable further work on important issues in the domain of resource optimization and energy efficiency improvement.…”
Section: Workloads Based On Google Cloud Tracelogsmentioning
confidence: 99%