2009
DOI: 10.1109/icde.2009.128
|View full text |Cite
|
Sign up to set email alerts
|

Online Anomaly Prediction for Robust Cluster Systems

Abstract: Abstract-In this paper, we present a stream-based mining algorithm for online anomaly prediction. Many real-world applications such as data stream analysis requires continuous cluster operation. Unfortunately, today's large-scale cluster systems are still vulnerable to various software and hardware problems. System administrators are often overwhelmed by the tasks of correcting various system anomalies such as processing bottlenecks (i.e., full stream buffers), resource hot spots, and service level objective (… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
33
0

Year Published

2009
2009
2019
2019

Publication Types

Select...
3
2
2

Relationship

2
5

Authors

Journals

citations
Cited by 73 publications
(33 citation statements)
references
References 25 publications
0
33
0
Order By: Relevance
“…We provide a path computation algorithm that takes into account such failure probabilities towards choosing the most resilient combination of parallel paths. The failure probabilities of brokers and links are assumed to be known in advance, while our algorithm can accommodate various definitions of resiliency such as [11] or using historic information. For example, the percentage of time that a broker is available in a specific operational period of time can be extracted from traces such as the all-pairs-pings service.…”
Section: Overlay Routingmentioning
confidence: 99%
“…We provide a path computation algorithm that takes into account such failure probabilities towards choosing the most resilient combination of parallel paths. The failure probabilities of brokers and links are assumed to be known in advance, while our algorithm can accommodate various definitions of resiliency such as [11] or using historic information. For example, the percentage of time that a broker is available in a specific operational period of time can be extracted from traces such as the all-pairs-pings service.…”
Section: Overlay Routingmentioning
confidence: 99%
“…To achieve generality, the ALERT system is implemented based on standard Linux APIs, which allows us to port the ALERT system to different hosting infrastructure easily. We collect about 20 metrics on each host in IBM System S [23,24] and about 66 metrics on each host in PlanetLab [4]. Table 1 lists a subset of key metrics collected by ALERT on System S and PlanetLab.…”
Section: System Implementationmentioning
confidence: 99%
“…For example, we collect about 20 metrics on each host in IBM System S [23,24] and about 66 metrics on each host in PlanetLab [4]. The monitoring sensor periodically samples each metric value at a certain rate (e.g., one sample every 10 seconds) to form a measurement stream.…”
Section: Baseline Anomaly Prediction Modelmentioning
confidence: 99%
See 1 more Smart Citation
“…Although previous work (e.g., [10], [11], [12]) has addressed the anomaly detection problem, anomaly prediction needs to capture pre-anomaly symptoms to raise advance anomaly alert before the anomaly happens. In [13], we presented the initial design of our online anomaly prediction scheme. However, one big question is whether real system anomalies do exhibit certain predictability and whether our anomaly prediction scheme can efficiently capture the predictability.…”
Section: Introductionmentioning
confidence: 99%