OneLog: Towards End-to-End Training in Software Log Anomaly Detection

Hashemi, Shayan; Mäntylä, Martti

doi:10.48550/arxiv.2104.07324

Cited by 3 publications

(7 citation statements)

References 20 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Guo et al [38] are the only authors to consider federated learning, where learning takes place in a distributed manner across multiple systems. Hashemi et al [42] also go into this direction as they combine multiple data sets to evaluate whether this affects the performance of their model. We believe that federated learning could be an interesting topic for future publications as there exist many real-world scenarios where log data is monitored in distributed machines but orchestration of deployed detectors takes place centrally [106].…”

Section: Discussionmentioning

confidence: 99%

“…Some authors also use custom embedding models based on deep learning; we refer to their output as Deep Encoded Embeddings (DE). This includes a combination of character-, event-and sequence-based embeddings [42], attention mechanisms using MLPs and CNNs [45], and token counts with label information fed into VAEs [1].…”

Section: Log Data Preparationmentioning

confidence: 99%

“…comp. Failures [1], [17], [20], [22], [23], [25], [27], [29]- [31], [33], [37]- [40], [42], [44], [46]- [48], [51], [52], [55], [56], [58], [59], [61], [62], [65]- [74], [76]- [78] BlueGene/L (BGL) [89] 2007 High-perf. comp.…”

Section: Data Setmentioning

confidence: 99%

“…-Failures [24], [34]- [36], [38]- [41], [55], [60], [64], [71], [73] OpenStack [20] 2017 Virtual machines Failures [20], [26], [34]- [36], [42], [44], [50], [52], [63], [76]- [78] Hadoop [90] 2016 High-perf. comp.…”

Section: Data Setmentioning

confidence: 99%

“…Other metrics that are more specific to deep learning applications are the number of model parameters [38], [61] and time to train models or run the detection (ER-3) [29], [32], [37], [47], [52], [68]. Some authors also assess characteristics of their approaches that go beyond standard anomaly detection evaluations, for example, whether training on combinations of multiple data sets improves the overall performance of classification [42] or whether their approaches are robust against changes of log patterns over time [17], [42], [44].…”

Section: Data Setmentioning

confidence: 99%

See 4 more Smart Citations

Deep Learning for Anomaly Detection in Log Data: A Survey

Landauer¹,

Onder²,

Skopik³

et al. 2022

Preprint

View full text Add to dashboard Cite

Automatic log file analysis enables early detection of relevant incidents such as system failures. In particular, selflearning anomaly detection techniques capture patterns in log data and subsequently report unexpected log event occurrences to system operators without the need to provide or manually model anomalous scenarios in advance. Recently, an increasing number of approaches leveraging deep learning neural networks for this purpose have been presented. These approaches have demonstrated superior detection performance in comparison to conventional machine learning techniques and simultaneously resolve issues with unstable data formats. However, there exist many different architectures for deep learning and it is nontrivial to encode raw and unstructured log data to be analyzed by neural networks. We therefore carry out a systematic literature review that provides an overview of deployed models, data pre-processing mechanisms, anomaly detection techniques, and evaluations. The survey does not quantitatively compare existing approaches but instead aims to help readers understand relevant aspects of different model architectures and emphasizes open issues for future work.

show abstract

Section: Discussionmentioning

confidence: 99%

Section: Log Data Preparationmentioning

confidence: 99%

Section: Data Setmentioning

confidence: 99%

Section: Data Setmentioning

confidence: 99%

Section: Data Setmentioning

confidence: 99%

See 3 more Smart Citations

Deep Learning for Anomaly Detection in Log Data: A Survey

Landauer¹,

Onder²,

Skopik³

et al. 2022

Preprint

View full text Add to dashboard Cite

show abstract

Anomaly detection on OpenStack logs based on an improved robust principal component analysis model and its projection onto column space

2022

View full text Add to dashboard Cite

With the advent of technology and the development of more complex software systems, the size of logs generated by these systems has increasingly risen so that the anomaly detection for remediating common errors has been more difficult than ever. The cloud emergence in the information technology (IT) industry has led to the immigration of enterprises toward it, which has extended the application of cloud management stacks such as OpenStack. By using the OpenStack platform, users can access resource infrastructure and manage virtual machines (VMs). The anomaly detection in OpenStack logs is not realized conveniently due to the substantial size of logs, and it is required to automate this process. Since there is no appropriate open‐source dataset for OpenStack logs, we have generated 25,000 logs by injecting three types of anomalies to propose a more efficient technique in terms of performance and time in detecting anomalies in OpenStack logs relative to recent studies by proper OpenStack log parsing and analyzing these logs by data mining algorithms. To this end, compared to the previous research study, we could improve the anomaly detection performance in terms of F1 score, recall, and precision by 9%, 4%, and 14%, respectively, and decrease the running time relative to the log size by at least 30 s.

show abstract

A Critical Review of Common Log Data Sets Used for Evaluation of Sequence-Based Anomaly Detection Techniques

Landauer,

Skopik,

Wurzenberger

2024

Proc. ACM Softw. Eng.

View full text Add to dashboard Cite

Log data store event execution patterns that correspond to underlying workflows of systems or applications. While most logs are informative, log data also include artifacts that indicate failures or incidents. Accordingly, log data are often used to evaluate anomaly detection techniques that aim to automatically disclose unexpected or otherwise relevant system behavior patterns. Recently, detection approaches leveraging deep learning have increasingly focused on anomalies that manifest as changes of sequential patterns within otherwise normal event traces. Several publicly available data sets, such as HDFS, BGL, Thunderbird, OpenStack, and Hadoop, have since become standards for evaluating these anomaly detection techniques, however, the appropriateness of these data sets has not been closely investigated in the past. In this paper we therefore analyze six publicly available log data sets with focus on the manifestations of anomalies and simple techniques for their detection. Our findings suggest that most anomalies are not directly related to sequential manifestations and that advanced detection techniques are not required to achieve high detection rates on these data sets.

show abstract

OneLog: Towards End-to-End Training in Software Log Anomaly Detection

Cited by 3 publications

References 20 publications

Deep Learning for Anomaly Detection in Log Data: A Survey

Deep Learning for Anomaly Detection in Log Data: A Survey

Anomaly detection on OpenStack logs based on an improved robust principal component analysis model and its projection onto column space

A Critical Review of Common Log Data Sets Used for Evaluation of Sequence-Based Anomaly Detection Techniques

Contact Info

Product

Resources

About