Large-scale hosting infrastructures require automatic system anomaly management to achieve continuous system operation. In this paper, we present a novel adaptive runtime anomaly prediction system, called ALERT, to achieve robust hosting infrastructures. In contrast to traditional anomaly detection schemes, ALERT aims at raising advance anomaly alerts to achieve just-in-time anomaly prevention. We propose a novel context-aware anomaly prediction scheme to improve prediction accuracy in dynamic hosting infrastructures. We have implemented the ALERT system and deployed it on several production hosting infrastructures such as IBM System S stream processing cluster and PlanetLab. Our experiments show that ALERT can achieve high prediction accuracy for a range of system anomalies and impose low overhead to the hosting infrastructure.
Abstract-Distributed applications running inside cloud systems are prone to performance anomalies due to various reasons such as resource contentions, software bugs, and hardware failures. One big challenge for diagnosing an abnormal distributed application is to pinpoint the faulty components. In this paper, we present a black-box online fault localization system called FChain that can pinpoint faulty components immediately after a performance anomaly is detected. FChain first discovers the onset time of abnormal behaviors at different components by distinguishing the abnormal change point from many change points caused by normal workload fluctuations. Faulty components are then pinpointed based on the abnormal change propagation patterns and inter-component dependency relationships. FChain performs runtime validation to further filter out false alarms. We have implemented FChain on top of the Xen platform and tested it using several benchmark applications (RUBiS, Hadoop, and IBM System S). Our experimental results show that FChain can quickly pinpoint the faulty components with high accuracy within a few seconds. FChain can achieve up to 90% higher precision and 20% higher recall than existing schemes. FChain is nonintrusive and light-weight, which imposes less than 1% overhead to the cloud system.
Automatic management of large-scale production systems requires a continuous monitoring service to keep track of the states of the managed system. However, it is challenging to achieve both scalability and high information precision while continuously monitoring a large amount of distributed and time-varying metrics in large-scale production systems. In this paper, we present a new self-correlating, predictive information tracking system called InfoTrack, which employs lightweight temporal and spatial correlation discovery methods to minimize continuous monitoring cost. InfoTrack combines both metric value prediction within individual nodes and adaptive clustering among distributed nodes to suppress remote information update in distributed system monitoring. We have implemented a prototype of the InfoTrack system and deployed the system on the PlanetLab. We evaluated the performance of the InfoTrack system using both real system traces and micro-benchmark prototype experiments. The experimental results show that InfoTrack can reduce the continuous monitoring cost by 50-90% while maintaining high information precision (i.e., within 0.01-0.05 error bound).
We present an efficient rate control strategy for H.264 in order to maximize the video quality by appropriately determining the quantization parameter (QP) for each macroblock. To break the chicken-and-egg dilemma resulting from QP-dependent ratedistortion optimization (RDO) in H.264, a preanalysis phase is conducted to gain the necessary source information, and then the coarse QP is decided for rate-distortion (RD) estimation. After motion estimation, we further refine the QP of each mode using the obtained actual standard deviation of motion-compensated residues. In the encoding process, RDO only performs once for each macroblock, thus one-pass, while QP determination is conducted twice. Therefore, the increase of computational complexity is small compared to that of the JM 9.3 software. Experimental results indicate that our rate control scheme with two-step QP determination but single-pass encoding not only effectively improves the average PSNR but also controls the target bit rates well.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.