2010
DOI: 10.1007/s10586-010-0120-0
|View full text |Cite
|
Sign up to set email alerts
|

Ranking the importance of alerts for problem determination in large computer systems

Abstract: The complexity of large computer systems has raised unprecedented challenges for system management. In practice, operators often collect large volume of monitoring data from system components and set up many rules to check data and trigger alerts. However, the alerts from various rules usually have different problem reporting accuracy because their thresholds are often manually set based on operators' experience and intuition. Meantime, due to system dependencies, a single problem may trigger many alerts at th… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
4
1

Citation Types

0
6
0

Year Published

2010
2010
2023
2023

Publication Types

Select...
4
3
2

Relationship

0
9

Authors

Journals

citations
Cited by 23 publications
(6 citation statements)
references
References 10 publications
0
6
0
Order By: Relevance
“…Correlation-based techniques can be used to analyze data and automatically discover causal relationships among pairs of metrics [24], [77]- [79]. Perturbation in the learned correlations may indicate faults.…”
Section: A Statistical Techniquesmentioning
confidence: 99%
“…Correlation-based techniques can be used to analyze data and automatically discover causal relationships among pairs of metrics [24], [77]- [79]. Perturbation in the learned correlations may indicate faults.…”
Section: A Statistical Techniquesmentioning
confidence: 99%
“…Their correlation-like "rules" for inferring system problems are not general, unlike the LFD hypothesis. Some diagnostic approaches have used regression to automatically discover correlations between metric pairs [10,11]. However, they do not scale well to large numbers of nodes/metrics as they search for metric correlations locally and remotely between nodes.…”
Section: Related Workmentioning
confidence: 99%
“…Several other papers [5,11] took similar statistical analysis approaches to diagnose problems from system monitoring data. Jiang et al [14] developed an invariant-based approach to profile large systems for system management. But none of these works require real-time guarantee in diagnosis.…”
Section: Related Workmentioning
confidence: 99%