Data Discovery and Anomaly Detection Using Atypicality for Real-Valued Data

Sabeti, Elyas; Høst-Madsen, A.

doi:10.3390/e21030219

Cited by 12 publications

(18 citation statements)

References 50 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…A similar problem has been considered in [9] where the "best" k and the best basis over a library of orthonormal bases was found in order to suppress the noise. Here, we show that using the lossless compression as the constraint and adhering to strict decodability [10,11] can lead to an "optimal" k (and corresponding wavelet coefficients) whose variation in time series can be used as a discriminating feature in machine learning and anomaly detection setting. The process of finding such k leads to Rissanen's famous Minimum Description Length (MDL) approach [12] that provides a framework tailored for optimization and model selection in a lossless compression setting.…”

Section: Sparse Representation Using Orthonormal Basesmentioning

confidence: 95%

“…Atypicality is a data discovery and anomaly detection framework that is based on a central definition: "a sequence is atypical if it can be described (coded) with fewer bits in itself rather than using the (optimum) code for typical sequences" [10,11]. In the atypicality framework, the comparison of the descriptive codelength between a training-based typical encoder and a universal encoder (independent of the train data and any prior information) is the criterion for detecting anomalous segments of data, i.e.…”

Section: Anomaly Detection Using the Optimal Sparse Representation With Atypicalitymentioning

confidence: 99%

“…in which log l+τ is also added as the penalty for not knowing the start and end points of the anomalous sequence in advance [10,11], and then τ can be used as a detection hyperparameter for which a value can be derived by cross-validation. As such, let L a = L a − τ .…”

Section: Anomaly Detection Using the Optimal Sparse Representation With Atypicalitymentioning

confidence: 99%

See 2 more Smart Citations

Data Discovery Using Lossless Compression-Based Sparse Representation

Sabeti

Song

Hero

2021

ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

Self Cite

View full text Add to dashboard Cite

Sparse representation has been widely used in data compression, signal and image denoising, dimensionality reduction and computer vision. While overcomplete dictionaries are required for sparse representation of multidimensional data, orthogonal bases represent one-dimensional data well. In this paper, we propose a data-driven sparse representation using orthonormal bases under the lossless compression constraint. We show that imposing such constraint under the Minimum Description Length (MDL) principle leads to a unique and optimal sparse representation for one-dimensional data, which results in discriminative features useful for data discovery.

show abstract

Section: Sparse Representation Using Orthonormal Basesmentioning

confidence: 95%

Section: Anomaly Detection Using the Optimal Sparse Representation With Atypicalitymentioning

confidence: 99%

See 1 more Smart Citation

Data Discovery Using Lossless Compression-Based Sparse Representation

Sabeti

Song

Hero

2021

ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

Self Cite

View full text Add to dashboard Cite

show abstract

“…For example, Facebook lost hundreds of millions of dollars in data breaches due to hacking. In order to cope with increasingly complex network attacks, more and more security methods [2][3][4][5][6][7] analyze the characteristics of network traffic from different angles to establish a flexible and reliable intrusion detection system (IDS). The IDS collects and analyzes information about several key points in a computer network or computer system, and it analyzes whether the network system is being attacked.…”

Section: Introductionmentioning

confidence: 99%

Malicious Network Traffic Detection Based on Deep Neural Networks and Association Analysis

Gao

Liu

et al. 2020

Sensors

View full text Add to dashboard Cite

Anomaly detection systems can accurately identify malicious network traffic, providing network security. With the development of internet technology, network attacks are becoming more and more sourced and complicated, making it difficult for traditional anomaly detection systems to effectively analyze and identify abnormal traffic. At present, deep neural network (DNN) technology achieved great results in terms of anomaly detection, and it can achieve automatic detection. However, there still exists misclassified traffic in the prediction results of deep neural networks, resulting in redundant alarm information. This paper designs a two-level anomaly detection system based on deep neural network and association analysis. We made a comprehensive evaluation of experiments using DNNs and other neural networks based on publicly available datasets. Through the experiments, we chose DNN-4 as an important part of our system, which has high precision and accuracy in identifying malicious traffic. The Apriori algorithm can mine rules between various discretized features and normal labels, which can be used to filter the classified traffic and reduce the false positive rate. Finally, we designed an intrusion detection system based on DNN-4 and association rules. We conducted experiments on the public training set NSL-KDD, which is considered as a modified dataset for the KDDCup 1999. The results show that our detection system has great precision in malicious traffic detection, and it achieves the effect of reducing the number of false alarms.

show abstract

“…[ 15 ] focused on the description length and compressibility. This requires computing lengths for every subsequence of the suspected sequence in order to achieve a figure of merit ([ 37 ] extends the journal version [ 38 ] to real-valued data). In the scheme suggested herein, probabilities are directly assigned to short sequences, with one traverse through a tree.…”

Section: Introductionmentioning

confidence: 99%

Anomaly Detection for Individual Sequences with Applications in Identifying Malicious Tools

Siboni

Cohen

2020

Entropy

View full text Add to dashboard Cite

Anomaly detection refers to the problem of identifying abnormal behaviour within a set of measurements. In many cases, one has some statistical model for normal data, and wishes to identify whether new data fit the model or not. However, in others, while there are normal data to learn from, there is no statistical model for this data, and there is no structured parameter set to estimate. Thus, one is forced to assume an individual sequences setup, where there is no given model or any guarantee that such a model exists. In this work, we propose a universal anomaly detection algorithm for one-dimensional time series that is able to learn the normal behaviour of systems and alert for abnormalities, without assuming anything on the normal data, or anything on the anomalies. The suggested method utilizes new information measures that were derived from the Lempel–Ziv (LZ) compression algorithm in order to optimally and efficiently learn the normal behaviour (during learning), and then estimate the likelihood of new data (during operation) and classify it accordingly. We apply the algorithm to key problems in computer security, as well as a benchmark anomaly detection data set, all using simple, single-feature time-indexed data. The first is detecting Botnets Command and Control (C&C) channels without deep inspection. We then apply it to the problems of malicious tools detection via system calls monitoring and data leakage identification.We conclude with the New York City (NYC) taxi data. Finally, while using information theoretic tools, we show that an attacker’s attempt to maliciously fool the detection system by trying to generate normal data is bound to fail, either due to a high probability of error or because of the need for huge amounts of resources.

show abstract

Data Discovery and Anomaly Detection Using Atypicality for Real-Valued Data

Cited by 12 publications

References 50 publications

Data Discovery Using Lossless Compression-Based Sparse Representation

Data Discovery Using Lossless Compression-Based Sparse Representation

Malicious Network Traffic Detection Based on Deep Neural Networks and Association Analysis

Anomaly Detection for Individual Sequences with Applications in Identifying Malicious Tools

Contact Info

Product

Resources

About