2019
DOI: 10.3390/e21030219
|View full text |Cite
|
Sign up to set email alerts
|

Data Discovery and Anomaly Detection Using Atypicality for Real-Valued Data

Abstract: The aim of using atypicality is to extract small, rare, unusual and interesting pieces out of big data. This complements statistics about typical data to give insight into data. In order to find such “interesting” parts of data, universal approaches are required, since it is not known in advance what we are looking for. We therefore base the atypicality criterion on codelength. In a prior paper we developed the methodology for discrete-valued data, and the current paper extends this to real-valued data. This i… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
2

Citation Types

0
18
0

Year Published

2020
2020
2024
2024

Publication Types

Select...
3
3

Relationship

1
5

Authors

Journals

citations
Cited by 12 publications
(18 citation statements)
references
References 50 publications
0
18
0
Order By: Relevance
“…A similar problem has been considered in [9] where the "best" k and the best basis over a library of orthonormal bases was found in order to suppress the noise. Here, we show that using the lossless compression as the constraint and adhering to strict decodability [10,11] can lead to an "optimal" k (and corresponding wavelet coefficients) whose variation in time series can be used as a discriminating feature in machine learning and anomaly detection setting. The process of finding such k leads to Rissanen's famous Minimum Description Length (MDL) approach [12] that provides a framework tailored for optimization and model selection in a lossless compression setting.…”
Section: Sparse Representation Using Orthonormal Basesmentioning
confidence: 95%
See 2 more Smart Citations
“…A similar problem has been considered in [9] where the "best" k and the best basis over a library of orthonormal bases was found in order to suppress the noise. Here, we show that using the lossless compression as the constraint and adhering to strict decodability [10,11] can lead to an "optimal" k (and corresponding wavelet coefficients) whose variation in time series can be used as a discriminating feature in machine learning and anomaly detection setting. The process of finding such k leads to Rissanen's famous Minimum Description Length (MDL) approach [12] that provides a framework tailored for optimization and model selection in a lossless compression setting.…”
Section: Sparse Representation Using Orthonormal Basesmentioning
confidence: 95%
“…Atypicality is a data discovery and anomaly detection framework that is based on a central definition: "a sequence is atypical if it can be described (coded) with fewer bits in itself rather than using the (optimum) code for typical sequences" [10,11]. In the atypicality framework, the comparison of the descriptive codelength between a training-based typical encoder and a universal encoder (independent of the train data and any prior information) is the criterion for detecting anomalous segments of data, i.e.…”
Section: Anomaly Detection Using the Optimal Sparse Representation With Atypicalitymentioning
confidence: 99%
See 1 more Smart Citation
“…For example, Facebook lost hundreds of millions of dollars in data breaches due to hacking. In order to cope with increasingly complex network attacks, more and more security methods [2][3][4][5][6][7] analyze the characteristics of network traffic from different angles to establish a flexible and reliable intrusion detection system (IDS). The IDS collects and analyzes information about several key points in a computer network or computer system, and it analyzes whether the network system is being attacked.…”
Section: Introductionmentioning
confidence: 99%
“…[ 15 ] focused on the description length and compressibility. This requires computing lengths for every subsequence of the suspected sequence in order to achieve a figure of merit ([ 37 ] extends the journal version [ 38 ] to real-valued data). In the scheme suggested herein, probabilities are directly assigned to short sequences, with one traverse through a tree.…”
Section: Introductionmentioning
confidence: 99%