Guoli Cheng scite author profile

Logs play an important role in the maintenance of large-scale systems. The number of logs which indicate normal (normal logs) differs greatly from the number of logs that indicate anomalies (abnormal logs), and the two types of logs have certain differences. To automatically obtain faults by K-Nearest Neighbor (KNN) algorithm, an outlier detection method with high accuracy, is an effective way to detect anomalies from logs. However, logs have the characteristics of large scale and very uneven samples, which will affect the results of KNN algorithm on log-based anomaly detection. Thus, we propose an improved KNN algorithm-based method which uses the existing mean-shift clustering algorithm to efficiently select the training set from massive logs. Then we assign different weights to samples with different distances, which reduces the negative effect of unbalanced distribution of the log samples on the accuracy of KNN algorithm. By comparing experiments on log sets from five supercomputers, the results show that the method we proposed can be effectively applied to log-based anomaly detection, and the accuracy, recall rate and F measure with our method are higher than those of traditional keyword search method.

show abstract

An Improved KNN-Based Efficient Log Anomaly Detection Method with Automatically Labeled Samples

Song

Wang

et al. 2021

ACM Trans. Knowl. Discov. Data

View full text Add to dashboard Cite

Logs that record system abnormal states (anomaly logs) can be regarded as outliers, and the k-Nearest Neighbor (kNN) algorithm has relatively high accuracy in outlier detection methods. Therefore, we use the kNN algorithm to detect anomalies in the log data. However, there are some problems when using the kNN algorithm to detect anomalies, three of which are: excessive vector dimension leads to inefficient kNN algorithm, unlabeled log data cannot support the kNN algorithm, and the imbalance of the number of log data distorts the classification decision of kNN algorithm. In order to solve these three problems, we propose an efficient log anomaly detection method based on an improved kNN algorithm with an automatically labeled sample set. This method first proposes a log parsing method based on N-gram and frequent pattern mining (FPM) method, which reduces the dimension of the log vector converted with Term frequency.Inverse Document Frequency (TF-IDF) technology. Then we use clustering and self-training method to get labeled log data sample set from historical logs automatically. Finally, we improve the kNN algorithm using average weighting technology, which improves the accuracy of the kNN algorithm on unbalanced samples. The method in this article is validated on six log datasets with different types.

show abstract

State-Degradation-Oriented Fault Diagnosis for High-Speed Train Running Gears System

Cheng

Wang

Luo

et al. 2020

Sensors

View full text Add to dashboard Cite

As one of the critical components of high-speed trains, the running gears system directly affects the operation performance of the train. This paper proposes a state-degradation-oriented method for fault diagnosis of an actual running gears system based on the Wiener state degradation process and multi-sensor filtering. First of all, for the given measurements of the high-speed train, this paper considers the information acquisition and transfer characteristics of composite sensors, which establish a distributed topology for axle box bearing. Secondly, a distributed filtering is built based on the bilinear system model, and the gain parameters of the filter are designed to minimize the mean square error. For a better presentation of the degradation characteristics in actual operation, this paper constructs an improved nonlinear model. Finally, threshold is determined based on the Chebyshev’s inequality for a reliable fault diagnosis. Open datasets of rotating machinery bearings and the real measurements are utilized in the case studies to demonstrate the effectiveness of the proposed method. Results obtained in this paper are consistent with the actual situation, which validate the proposed methods.

show abstract

EH-Recommender: Recommending Exception Handling Strategies Based on Program Context

Song

Jia

et al. 2018

View full text Add to dashboard Cite

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Guoli Cheng

Efficient Performance Prediction for Apache Spark

Log-Based Anomaly Detection with the Improved K-Nearest Neighbor

An Improved KNN-Based Efficient Log Anomaly Detection Method with Automatically Labeled Samples

State-Degradation-Oriented Fault Diagnosis for High-Speed Train Running Gears System

EH-Recommender: Recommending Exception Handling Strategies Based on Program Context

Contact Info

Product

Resources

About