2014 IEEE 25th International Symposium on Software Reliability Engineering 2014
DOI: 10.1109/issre.2014.34
|View full text |Cite
|
Sign up to set email alerts
|

Failure Analysis of Jobs in Compute Clouds: A Google Cluster Case Study

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
46
0

Year Published

2015
2015
2020
2020

Publication Types

Select...
4
2
2

Relationship

0
8

Authors

Journals

citations
Cited by 77 publications
(47 citation statements)
references
References 24 publications
0
46
0
Order By: Relevance
“…Chen et al [4] presented a study about failures in Cloud environment, using measured data from Google cluster [20]. These studies show the increasing failure rates in HPC clusters and Cloud clusters.…”
Section: Related Workmentioning
confidence: 99%
“…Chen et al [4] presented a study about failures in Cloud environment, using measured data from Google cluster [20]. These studies show the increasing failure rates in HPC clusters and Cloud clusters.…”
Section: Related Workmentioning
confidence: 99%
“…Cloud systems experience frequent failures due to their large-scale and distributed nature [16]. Failures of any components in the cloud may cause the jobs to be interrupted.…”
Section: Related Workmentioning
confidence: 99%
“…Failures of any components in the cloud may cause the jobs to be interrupted. Jobs may span thousands of cloud components and run for a long time before being interrupted, which leads to the wastage of energy and other resources [16]. Thus, one of the main challenges in cloud systems is to assure the reliability of job execution in the presence of failures.…”
Section: Related Workmentioning
confidence: 99%
“…Cloud applications may span thousands of nodes and run for a long time before being aborted, which leads to the wastage of energy and other resources. [3,4,5] In order to minimize failed execution and thus the multiple re-executions of the same workflow fault tolerance techniques must be investigated and supported. Since the numbers of failures are high and the types of them vary, general methods can hardly exist.…”
Section: Relationship To Cloud-based Solutionsmentioning
confidence: 99%