Online Anomaly Prediction for Robust Cluster Systems

Gu, Xiaohui; Wang, Haixun

doi:10.1109/icde.2009.128

Cited by 73 publications

(33 citation statements)

References 25 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…We provide a path computation algorithm that takes into account such failure probabilities towards choosing the most resilient combination of parallel paths. The failure probabilities of brokers and links are assumed to be known in advance, while our algorithm can accommodate various definitions of resiliency such as [11] or using historic information. For example, the percentage of time that a broker is available in a specific operational period of time can be extracted from traces such as the all-pairs-pings service.…”

Section: Overlay Routingmentioning

confidence: 99%

Message-Oriented Middleware with QoS Awareness

Yang¹,

Kim²,

Karenos³

et al. 2009

Service-Oriented Computing – ICSOC 2007

View full text Add to dashboard Cite

Abstract. Publish/subscribe messaging is a fundamental mechanism for interconnecting disparate services and systems in the service-oriented computing architecture. The quality of services (QoS) of the messaging substrate plays a critical role in the overall system performance as perceived by the end users. In this paper, we present the design and implementation of Harmony, an overlay-based messaging system that can manage the end-to-end QoS in wide-area publish/subscribe communications based on the application requirements. This is achieved through a holistic set of overlay route establishment and maintenance mechanisms, which actively exploit the diversity in the network paths and redirect the traffic over links with good quality, e.g., low latency and high availability. In order to cope with network dynamics and failures, Harmony continuously monitors the link quality and adapts the routes whenever their quality deteriorates below the application requirements. Harmony can operate on top of different data transport layers. When the transport layer has built-in message scheduling capability, Harmony takes advantage of it and utilizes a novel budget allocation scheme to control the scheduling behavior. We have fully implemented the Harmony messaging system, and our empirical experience has confirmed its effectiveness in providing end-to-end QoS in dynamic wide-area network environments.

show abstract

Section: Overlay Routingmentioning

confidence: 99%

Message-Oriented Middleware with QoS Awareness

Yang¹,

Kim²,

Karenos³

et al. 2009

Service-Oriented Computing – ICSOC 2007

View full text Add to dashboard Cite

show abstract

“…To achieve generality, the ALERT system is implemented based on standard Linux APIs, which allows us to port the ALERT system to different hosting infrastructure easily. We collect about 20 metrics on each host in IBM System S [23,24] and about 66 metrics on each host in PlanetLab [4]. Table 1 lists a subset of key metrics collected by ALERT on System S and PlanetLab.…”

Section: System Implementationmentioning

confidence: 99%

“…For example, we collect about 20 metrics on each host in IBM System S [23,24] and about 66 metrics on each host in PlanetLab [4]. The monitoring sensor periodically samples each metric value at a certain rate (e.g., one sample every 10 seconds) to form a measurement stream.…”

Section: Baseline Anomaly Prediction Modelmentioning

confidence: 99%

See 1 more Smart Citation

Adaptive system anomaly prediction for large-scale hosting infrastructures

Tan

Wang

2010

Proceedings of the 29th ACM SIGACT-SIGOPS Symposium on Principles of Distributed Computing

Self Cite

View full text Add to dashboard Cite

Large-scale hosting infrastructures require automatic system anomaly management to achieve continuous system operation. In this paper, we present a novel adaptive runtime anomaly prediction system, called ALERT, to achieve robust hosting infrastructures. In contrast to traditional anomaly detection schemes, ALERT aims at raising advance anomaly alerts to achieve just-in-time anomaly prevention. We propose a novel context-aware anomaly prediction scheme to improve prediction accuracy in dynamic hosting infrastructures. We have implemented the ALERT system and deployed it on several production hosting infrastructures such as IBM System S stream processing cluster and PlanetLab. Our experiments show that ALERT can achieve high prediction accuracy for a range of system anomalies and impose low overhead to the hosting infrastructure.

show abstract

“…Although previous work (e.g., [10], [11], [12]) has addressed the anomaly detection problem, anomaly prediction needs to capture pre-anomaly symptoms to raise advance anomaly alert before the anomaly happens. In [13], we presented the initial design of our online anomaly prediction scheme. However, one big question is whether real system anomalies do exhibit certain predictability and whether our anomaly prediction scheme can efficiently capture the predictability.…”

Section: Introductionmentioning

confidence: 99%

On Predictability of System Anomalies in Real World

Tan

2010

2010 IEEE International Symposium on Modeling, Analysis and Simulation of Computer and Telecommunication Systems

Self Cite

View full text Add to dashboard Cite

Abstract-As computer systems become increasingly complex, system anomalies have become major concerns in system management. In this paper, we present a comprehensive measurement study to quantify the predictability of different system anomalies. Online anomaly prediction allows the system to foresee impending anomalies so as to take proper actions to mitigate anomaly impact. Our anomaly prediction approach combines feature value prediction with statistical classification methods. We conduct extensive measurement study to investigate anomalous behavior of three systems in the real world: PlanetLab, SMART hard drive data, and IBM System S. We observe that real world system anomalies do exhibit predictability, which can be predicted with high accuracy and significant lead time.

show abstract

Online Anomaly Prediction for Robust Cluster Systems

Cited by 73 publications

References 25 publications

Message-Oriented Middleware with QoS Awareness

Message-Oriented Middleware with QoS Awareness

Adaptive system anomaly prediction for large-scale hosting infrastructures

On Predictability of System Anomalies in Real World

Contact Info

Product

Resources

About