2014 IEEE International Symposium on Software Reliability Engineering Workshops 2014
DOI: 10.1109/issrew.2014.105
|View full text |Cite
|
Sign up to set email alerts
|

Failure Prediction of Jobs in Compute Clouds: A Google Cluster Case Study

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

1
12
0

Year Published

2016
2016
2024
2024

Publication Types

Select...
4
2
2

Relationship

0
8

Authors

Journals

citations
Cited by 52 publications
(16 citation statements)
references
References 14 publications
1
12
0
Order By: Relevance
“…Higher priority indicates higher preference for resources. According to [1] , 12 priorities can be grouped into five classes: gratis(0-1), batch (2)(3)(4)(5)(6)(7)(8), normal production(9), monitoring(10), and infrastructure (11). The number of killer tasks at "normal production" priority is 1,146, which coincides with the description that priority 9 is dominant in production priorities in [17].…”
Section: A Failure Frequency Analysissupporting
confidence: 53%
See 2 more Smart Citations
“…Higher priority indicates higher preference for resources. According to [1] , 12 priorities can be grouped into five classes: gratis(0-1), batch (2)(3)(4)(5)(6)(7)(8), normal production(9), monitoring(10), and infrastructure (11). The number of killer tasks at "normal production" priority is 1,146, which coincides with the description that priority 9 is dominant in production priorities in [17].…”
Section: A Failure Frequency Analysissupporting
confidence: 53%
“…In contrast, we discover the resource usage pattern to recognize killer tasks and avoid resource wasting. In their recent work, they convert task attributes and mean resource usage as features, and apply recurrent neural network to predict task failures [11]. However, only average resource usage instead of time series data is used in their model.…”
Section: A Google Trace Analysismentioning
confidence: 99%
See 1 more Smart Citation
“…There are existing research works in the literature that applied statistical, machine and deep learning methods using Google dataset for different prediction purposes such as workload, scheduling, and job/task failure prediction. Chen et al [12] studied main features of application job and task failures in cloud computing. Authors analyzed events and resource usages of the jobs and tasks to determine features related to the failures.…”
Section: A Task Failure Predictionmentioning
confidence: 99%
“…However, they don't leverage a specific technique to conduct failure prediction. In their later work, they convert job attributes and mean resource usage as features, and apply recurrent neural network to predict job failures [15]. El-Sayed et al [16] characterize unsuccessful jobs and employ classification techniques to predict job failures.…”
Section: Related Workmentioning
confidence: 99%