Anomaly Detection from Log Files Using Unsupervised Deep Learning

Buršić, Sathya; Cuculo, Vittorio; D’Amelio, Alessandro

doi:10.1007/978-3-030-54994-7_15

Cited by 19 publications

(17 citation statements)

References 8 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…The most common loss function in the reviewed publications is the Cross-Entropy (CE), in particular, the categorical cross-entropy for multi-class prediction [20], [57] or binary cross-entropy that only differentiates between the normal and anomalous class [61]. Other common loss functions include the Hyper-Sphere Objective Function (HS) where the distance to the center of a hyper-sphere represents the anomaly score [24], [39], [41], [62], the Mean Squared Error (MSE) that is used for regression [20], [27], [28], [47], [50], [53], [68], and the Kullback-Leibler Divergence (KL) and Marginal Likelihood (ML) that are useful to measure loss in probability distributions [49], [58].…”

Section: B Deep Learning Techniquesmentioning

confidence: 99%

“…The time stamp of log events is a special parameter as it allows to put other parameters in temporal context, which is required for time-series analysis. However, the time stamps themselves may be used for Time Embedding (TE) and serve as input to neural networks [27]. For this purpose, Li et al [46] generate vectors for sequences of time differences between event occurrences by applying soft one-hot encoding.…”

Section: Log Data Preparationmentioning

confidence: 99%

See 1 more Smart Citation

Deep Learning for Anomaly Detection in Log Data: A Survey

Landauer¹,

Onder²,

Skopik³

et al. 2022

Preprint

View full text Add to dashboard Cite

Automatic log file analysis enables early detection of relevant incidents such as system failures. In particular, selflearning anomaly detection techniques capture patterns in log data and subsequently report unexpected log event occurrences to system operators without the need to provide or manually model anomalous scenarios in advance. Recently, an increasing number of approaches leveraging deep learning neural networks for this purpose have been presented. These approaches have demonstrated superior detection performance in comparison to conventional machine learning techniques and simultaneously resolve issues with unstable data formats. However, there exist many different architectures for deep learning and it is nontrivial to encode raw and unstructured log data to be analyzed by neural networks. We therefore carry out a systematic literature review that provides an overview of deployed models, data pre-processing mechanisms, anomaly detection techniques, and evaluations. The survey does not quantitatively compare existing approaches but instead aims to help readers understand relevant aspects of different model architectures and emphasizes open issues for future work.

show abstract

Section: B Deep Learning Techniquesmentioning

confidence: 99%

Section: Log Data Preparationmentioning

confidence: 99%

Deep Learning for Anomaly Detection in Log Data: A Survey

Landauer¹,

Onder²,

Skopik³

et al. 2022

Preprint

View full text Add to dashboard Cite

show abstract

“…Table 2: Description of Fields for Syslogs field example description timestamp Jan 1 00:08:35 the time stamp host i151-306 the node where the job ran system-id kernel (linux) ID of the system application Lustre application name text message an error occurred .... 6 detailed message of the event 5.1.2 Rationalized logs. Rationalized logs was a new logging framework for TACC supercomputers instead of Syslogs.…”

Section: Evaluation System and Datasetsmentioning

confidence: 99%

“…Although log files are nontrivial for analysis (e.g., they are often unstructured, duplicated or even incomplete [10]), extensive research on failure-related analysis using HPC system logs has been undertaken such as detecting anomalies e.g., [4], [43], [6], diagnosing the root causes of failures, e.g., [10,14,17], and detecting the errors that lead to system failures, e.g., [3,42,57,81].…”

Section: Introductionmentioning

confidence: 99%

Clairvoyant

Alharthi

Jhumka

et al. 2022

Proceedings of the 36th ACM International Conference on Supercomputing

View full text Add to dashboard Cite

System failures are expected to be frequent in the exascale era such as current Petascale systems. The health of such systems is usually determined from challenging analysis of large amounts of unstructured & redundant log data. In this paper, we leverage log data and propose Clairvoyant, a novel self-supervised (i.e., no labels needed) model to predict node failures in HPC systems based on a recent deep learning approach called transformer-decoder and the self-attention mechanism. Clairvoyant predicts node failures by (i) predicting a sequence of log events and then (ii) identifying if a failure is a part of that sequence. We carefully evaluate Clairvoyant and another state-of-the-art failure prediction approach -Desh, based on two real-world system log datasets. Experiments show that Clairvoyant is significantly better: e.g., it can predict node failures with an average Bleu, Rouge, and MCC scores of 0.90, 0.78, and 0.65 respectively while Desh scores only 0.58, 0.58, and 0.25. More importantly, this improvement is achieved with faster training and prediction time, with Clairvoyant being about 25× and 15× faster than Desh respectively.

show abstract

“…In [2], the authors performed error detection in supercomputers by combining entropy, mutual information, and PCA approaches. In general, techniques have focused on capturing anomalies in system logs, e.g., these recent works were based on anomaly detection techniques [3]- [5]). Recently, techniques based on natural language processing (NLP) and artificial intelligence (AI) have been applied towards failure log analysis of these systems [6]- [8].…”

Section: Introductionmentioning

confidence: 99%