2012 International Conference for High Performance Computing, Networking, Storage and Analysis 2012
DOI: 10.1109/sc.2012.57
|View full text |Cite
|
Sign up to set email alerts
|

Fault prediction under the microscope: A closer look into HPC systems

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
60
0

Year Published

2014
2014
2021
2021

Publication Types

Select...
4
3
2

Relationship

0
9

Authors

Journals

citations
Cited by 75 publications
(61 citation statements)
references
References 21 publications
0
60
0
Order By: Relevance
“…Providers continuously attempt to improve datacenter energy-efficiency; as stated in [2], a substantial amount of energy-waste is due to failures. First, it is ideal that dependability mechanisms deployed in datacenters do not significantly degrade the energyefficiency of the system.…”
Section: Application Of Workmentioning
confidence: 99%
See 1 more Smart Citation
“…Providers continuously attempt to improve datacenter energy-efficiency; as stated in [2], a substantial amount of energy-waste is due to failures. First, it is ideal that dependability mechanisms deployed in datacenters do not significantly degrade the energyefficiency of the system.…”
Section: Application Of Workmentioning
confidence: 99%
“…According to the National Institute of Standards and Technology (NIST) [1], Cloud computing is defined as "a model for enabling ubiquitous, convenient, on-demand network access to a shared pool of configurable computing resources that can be rapidly provisioned and released with minimal management effort or service provider interaction". The failure characteristics of such environments are of particular concern as failures can result in degradation of Quality of Service (QoS), availability, reliability and energywaste [2] that can ultimately lead to economic loss for both Cloud consumers and providers.…”
Section: Introductionmentioning
confidence: 99%
“…The prediction of failures itself, however, was still an open issue. Recent results from the University of Illinois at Urbana-Champaign [54][55][56] and the Illinois Institute of Technology [101,102] clearly demonstrate the feasibility of error prediction for different systems: the Blue Waters CRAY system based on AMD processors and NVIDIA GPUs and the Blue Gene system based on IBM proprietary components. Failure prediction techniques have progressed by combining data mining with signal analysis and methods to spot outliers.…”
Section: Failure Predictionmentioning
confidence: 99%
“…To develop better solutions and reliability techniques, it is important to understand and characterize failure behavior. Indeed, failure characteristics can be used to inform failure predictors [8,2], or to improve fault-tolerance techniques [11,10,22].…”
Section: Related Workmentioning
confidence: 99%