Abstract:Enterprise systems typically produce a large number of logs to record runtime states and important events. Log anomaly detection is efficient for business management and system maintenance. Most existing log-based anomaly detection methods use log parser to get log event indexes or event templates and then utilize machine learning methods to detect anomalies. However, these methods cannot handle unknown log types and do not take advantage of the log semantic information. In this article, we propose ConAnomaly,… Show more
“…It turns out that data instability, i.e., the appearance of previously unknown events, is one of the main issues addressed by the reviewed approaches. The key idea to resolving this problem is currently to represent logs as semantic vectors so that new or changed events can still be compared to known events by measuring their similarities [17], [28], [41], [46], [48], [50], [57], [65], [69], [73]. There are many techniques for generating numeric vectors to represent log events (cf.…”
Automatic log file analysis enables early detection of relevant incidents such as system failures. In particular, selflearning anomaly detection techniques capture patterns in log data and subsequently report unexpected log event occurrences to system operators without the need to provide or manually model anomalous scenarios in advance. Recently, an increasing number of approaches leveraging deep learning neural networks for this purpose have been presented. These approaches have demonstrated superior detection performance in comparison to conventional machine learning techniques and simultaneously resolve issues with unstable data formats. However, there exist many different architectures for deep learning and it is nontrivial to encode raw and unstructured log data to be analyzed by neural networks. We therefore carry out a systematic literature review that provides an overview of deployed models, data pre-processing mechanisms, anomaly detection techniques, and evaluations. The survey does not quantitatively compare existing approaches but instead aims to help readers understand relevant aspects of different model architectures and emphasizes open issues for future work.
“…It turns out that data instability, i.e., the appearance of previously unknown events, is one of the main issues addressed by the reviewed approaches. The key idea to resolving this problem is currently to represent logs as semantic vectors so that new or changed events can still be compared to known events by measuring their similarities [17], [28], [41], [46], [48], [50], [57], [65], [69], [73]. There are many techniques for generating numeric vectors to represent log events (cf.…”
Automatic log file analysis enables early detection of relevant incidents such as system failures. In particular, selflearning anomaly detection techniques capture patterns in log data and subsequently report unexpected log event occurrences to system operators without the need to provide or manually model anomalous scenarios in advance. Recently, an increasing number of approaches leveraging deep learning neural networks for this purpose have been presented. These approaches have demonstrated superior detection performance in comparison to conventional machine learning techniques and simultaneously resolve issues with unstable data formats. However, there exist many different architectures for deep learning and it is nontrivial to encode raw and unstructured log data to be analyzed by neural networks. We therefore carry out a systematic literature review that provides an overview of deployed models, data pre-processing mechanisms, anomaly detection techniques, and evaluations. The survey does not quantitatively compare existing approaches but instead aims to help readers understand relevant aspects of different model architectures and emphasizes open issues for future work.
“…DL techniques commonly employed for IDSs include convolutional neural networks for spatial pattern recognition in network traffic [8], recurrent neural networks such as LSTMs for analyzing sequential data such as system logs [9], and autoencoders for anomaly detection by learning compressed representations of normal behavior [10]. While these techniques offer solutions for detecting various cyberthreats and anomalies in diverse network environments, they often require fixed training datasets and may lack the ability to adapt dynamically to new threats.…”
Our increasingly connected world continues to face an ever-growing number of network-based attacks. An Intrusion Detection System (IDS) is an essential security technology used for detecting these attacks. Although numerous Machine Learning-based IDSs have been proposed for the detection of malicious network traffic, the majority have difficulty properly detecting and classifying the more uncommon attack types. In this paper, we implement a novel hybrid technique using synthetic data produced by a Generative Adversarial Network (GAN) to use as input for training a Deep Reinforcement Learning (DRL) model. Our GAN model is trained on the NSL-KDD dataset, a publicly available collection of labeled network traffic data specifically designed to support the evaluation and benchmarking of IDSs. Ultimately, our findings demonstrate that training the DRL model on synthetic datasets generated by specific GAN models can result in better performance in correctly classifying minority classes over training on the true imbalanced dataset.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.