Yongmin Tan scite author profile

Large-scale hosting infrastructures require automatic system anomaly management to achieve continuous system operation. In this paper, we present a novel adaptive runtime anomaly prediction system, called ALERT, to achieve robust hosting infrastructures. In contrast to traditional anomaly detection schemes, ALERT aims at raising advance anomaly alerts to achieve just-in-time anomaly prevention. We propose a novel context-aware anomaly prediction scheme to improve prediction accuracy in dynamic hosting infrastructures. We have implemented the ALERT system and deployed it on several production hosting infrastructures such as IBM System S stream processing cluster and PlanetLab. Our experiments show that ALERT can achieve high prediction accuracy for a range of system anomalies and impose low overhead to the hosting infrastructure.

show abstract

FChain: Toward Black-Box Online Fault Localization for Cloud Systems

Nguyen

Shen

Tan

et al. 2013

View full text Add to dashboard Cite

Abstract-Distributed applications running inside cloud systems are prone to performance anomalies due to various reasons such as resource contentions, software bugs, and hardware failures. One big challenge for diagnosing an abnormal distributed application is to pinpoint the faulty components. In this paper, we present a black-box online fault localization system called FChain that can pinpoint faulty components immediately after a performance anomaly is detected. FChain first discovers the onset time of abnormal behaviors at different components by distinguishing the abnormal change point from many change points caused by normal workload fluctuations. Faulty components are then pinpointed based on the abnormal change propagation patterns and inter-component dependency relationships. FChain performs runtime validation to further filter out false alarms. We have implemented FChain on top of the Xen platform and tested it using several benchmark applications (RUBiS, Hadoop, and IBM System S). Our experimental results show that FChain can quickly pinpoint the faulty components with high accuracy within a few seconds. FChain can achieve up to 90% higher precision and 20% higher recall than existing schemes. FChain is nonintrusive and light-weight, which imposes less than 1% overhead to the cloud system.

show abstract

Self-correlating predictive information tracking for large-scale production systems

Zhao

Tan

Gong

et al. 2009

View full text Add to dashboard Cite

Automatic management of large-scale production systems requires a continuous monitoring service to keep track of the states of the managed system. However, it is challenging to achieve both scalability and high information precision while continuously monitoring a large amount of distributed and time-varying metrics in large-scale production systems. In this paper, we present a new self-correlating, predictive information tracking system called InfoTrack, which employs lightweight temporal and spatial correlation discovery methods to minimize continuous monitoring cost. InfoTrack combines both metric value prediction within individual nodes and adaptive clustering among distributed nodes to suppress remote information update in distributed system monitoring. We have implemented a prototype of the InfoTrack system and deployed the system on the PlanetLab. We evaluated the performance of the InfoTrack system using both real system traces and micro-benchmark prototype experiments. The experimental results show that InfoTrack can reduce the continuous monitoring cost by 50-90% while maintaining high information precision (i.e., within 0.01-0.05 error bound).

show abstract

Rate Control for H.264 with Two-Step Quantization Parameter Determination but Single-Pass Encoding

Yang

Tan

Ling

2006

EURASIP J. Adv. Signal Process.

View full text Add to dashboard Cite

We present an efficient rate control strategy for H.264 in order to maximize the video quality by appropriately determining the quantization parameter (QP) for each macroblock. To break the chicken-and-egg dilemma resulting from QP-dependent ratedistortion optimization (RDO) in H.264, a preanalysis phase is conducted to gain the necessary source information, and then the coarse QP is decided for rate-distortion (RD) estimation. After motion estimation, we further refine the QP of each mode using the obtained actual standard deviation of motion-compensated residues. In the encoding process, RDO only performs once for each macroblock, thus one-pass, while QP determination is conducted twice. Therefore, the increase of computational complexity is small compared to that of the JM 9.3 software. Experimental results indicate that our rate control scheme with two-step QP determination but single-pass encoding not only effectively improves the average PSNR but also controls the target bit rates well.

show abstract

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Yongmin Tan

PREPARE: Predictive Performance Anomaly Prevention for Virtualized Cloud Systems

Adaptive system anomaly prediction for large-scale hosting infrastructures

FChain: Toward Black-Box Online Fault Localization for Cloud Systems

Self-correlating predictive information tracking for large-scale production systems

Rate Control for H.264 with Two-Step Quantization Parameter Determination but Single-Pass Encoding

Contact Info

Product

Resources

About