2022 IEEE 20th Jubilee World Symposium on Applied Machine Intelligence and Informatics (SAMI) 2022
DOI: 10.1109/sami54271.2022.9780804
|View full text |Cite
|
Sign up to set email alerts
|

Fault detection in GPU-enabled Cloud Systems – An Overview

Abstract: Fault detection and handling are crucial tasks in cloud systems. As these infrastructures are growing and evolving, manual monitoring and interaction have become less feasible. To deal with this issue, monitoring systems are developed to track the behavior of the various components (e.g. nodes) in cloud systems, as well as the served applications in the virtual environment. Nowadays, most cloud environments provide graphics accelerators for their users leading to different problems. However, the application of… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1

Citation Types

0
1
0

Year Published

2023
2023
2023
2023

Publication Types

Select...
1

Relationship

0
1

Authors

Journals

citations
Cited by 1 publication
(1 citation statement)
references
References 52 publications
(60 reference statements)
0
1
0
Order By: Relevance
“…Our approach extends the results of the major related works summarized (partly leveraging [28]) in several areas including the application of various deep learning methods (autoencoders, LSTMs and GNNs), the extensive use of formal modelling, and the active steering towards the suspicious situations of a cloud debugger.…”
Section: Related Workmentioning
confidence: 59%
“…Our approach extends the results of the major related works summarized (partly leveraging [28]) in several areas including the application of various deep learning methods (autoencoders, LSTMs and GNNs), the extensive use of formal modelling, and the active steering towards the suspicious situations of a cloud debugger.…”
Section: Related Workmentioning
confidence: 59%