2017
DOI: 10.1155/2017/8501683
|View full text |Cite
|
Sign up to set email alerts
|

A Hybrid Semi-Supervised Anomaly Detection Model for High-Dimensional Data

Abstract: Anomaly detection, which aims to identify observations that deviate from a nominal sample, is a challenging task for high-dimensional data. Traditional distance-based anomaly detection methods compute the neighborhood distance between each observation and suffer from the curse of dimensionality in high-dimensional space; for example, the distances between any pair of samples are similar and each sample may perform like an outlier. In this paper, we propose a hybrid semi-supervised anomaly detection model for h… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
47
0

Year Published

2018
2018
2022
2022

Publication Types

Select...
5
4

Relationship

0
9

Authors

Journals

citations
Cited by 98 publications
(51 citation statements)
references
References 33 publications
(39 reference statements)
0
47
0
Order By: Relevance
“…Based on region of competence definition, similar samples for unknown query instance X query and the competence level of each base learner can be estimated. The common techniques for defining the region of competence include minimum difference minimization [16], k -nearest neighbors (KNN) [24], K -means [25], and the competence map method [26]. During the pruning stage, some learners are extracted to construct the expert with respect to the test set X Tes .…”
Section: Methodsmentioning
confidence: 99%
“…Based on region of competence definition, similar samples for unknown query instance X query and the competence level of each base learner can be estimated. The common techniques for defining the region of competence include minimum difference minimization [16], k -nearest neighbors (KNN) [24], K -means [25], and the competence map method [26]. During the pruning stage, some learners are extracted to construct the expert with respect to the test set X Tes .…”
Section: Methodsmentioning
confidence: 99%
“…• supervised: where label information regarding the threat type is available [10] • unsupervised: where no labelling data appear on the dataset [11], [12] • semi-supervised: where partial knowledge regarding the anomaly type is available [13], [14], [15] Towards the supervised direction, the key objective is to construct proper training datasets that include all the anomalous examples along with their corresponding labels. Since this procedure can be considered as a standard classification approach, the main advantage is its flexibility in identifying if a new pattern is suspicious or not, based on already existing attack patterns.…”
Section: B Network Traffic Anomaly Detectionmentioning
confidence: 99%
“…As supervised approaches imply that both normal and anomalous observations are classified in the training dataset, and this collection may be difficult to obtain, the authors of References [ 20 , 21 ] propose hybrid semi-supervised anomaly detection models for high-dimensional datasets. In semi-supervised approaches, only normal samples are available in the training set; that is, the user cannot obtain information about anomalies.…”
Section: Background and Related Workmentioning
confidence: 99%
“…Unknown samples are classified as outliers when their behavior is far from that of the known normal samples. In Reference [ 20 ], they propose an anomaly detection model that consists of two components: a deep auto-encoder (DAE) and an ensemble KNN graphs-based anomaly detector, whose consuming time is a quadratic function, . In Reference [ 21 ], their hybrid approach is based on k-means clustering and Sequential Minimal Optimization (SMO) classification, whose consuming time has a complexity of .…”
Section: Background and Related Workmentioning
confidence: 99%