A Hybrid Semi-Supervised Anomaly Detection Model for High-Dimensional Data

Song, Hongchao; Jiang, Zhuqing; Men, Aidong; Yang, Bo

doi:10.1155/2017/8501683

Cited by 98 publications

(51 citation statements)

References 33 publications

(39 reference statements)

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Based on region of competence definition, similar samples for unknown query instance X query and the competence level of each base learner can be estimated. The common techniques for defining the region of competence include minimum difference minimization [16], k -nearest neighbors (KNN) [24], K -means [25], and the competence map method [26]. During the pruning stage, some learners are extracted to construct the expert with respect to the test set X Tes .…”

Section: Methodsmentioning

confidence: 99%

Margin-Based Pareto Ensemble Pruning: An Ensemble Pruning Algorithm That Learns to Search Optimized Ensembles

Zhou

Liu

et al. 2019

Computational Intelligence and Neuroscience

View full text Add to dashboard Cite

The ensemble pruning system is an effective machine learning framework that combines several learners as experts to classify a test set. Generally, ensemble pruning systems aim to define a region of competence based on the validation set to select the most competent ensembles from the ensemble pool with respect to the test set. However, the size of the ensemble pool is usually fixed, and the performance of an ensemble pool heavily depends on the definition of the region of competence. In this paper, a dynamic pruning framework called margin-based Pareto ensemble pruning is proposed for ensemble pruning systems. The framework explores the optimized ensemble pool size during the overproduction stage and finetunes the experts during the pruning stage. The Pareto optimization algorithm is used to explore the size of the overproduction ensemble pool that can result in better performance. Considering the information entropy of the learners in the indecision region, the marginal criterion for each learner in the ensemble pool is calculated using margin criterion pruning, which prunes the experts with respect to the test set. The effectiveness of the proposed method for classification tasks is assessed using datasets. The results show that margin-based Pareto ensemble pruning can achieve smaller ensemble sizes and better classification performance in most datasets when compared with state-of-the-art models.

show abstract

Section: Methodsmentioning

confidence: 99%

Margin-Based Pareto Ensemble Pruning: An Ensemble Pruning Algorithm That Learns to Search Optimized Ensembles

Zhou

Liu

et al. 2019

Computational Intelligence and Neuroscience

View full text Add to dashboard Cite

show abstract

“…• supervised: where label information regarding the threat type is available [10] • unsupervised: where no labelling data appear on the dataset [11], [12] • semi-supervised: where partial knowledge regarding the anomaly type is available [13], [14], [15] Towards the supervised direction, the key objective is to construct proper training datasets that include all the anomalous examples along with their corresponding labels. Since this procedure can be considered as a standard classification approach, the main advantage is its flexibility in identifying if a new pattern is suspicious or not, based on already existing attack patterns.…”

Section: B Network Traffic Anomaly Detectionmentioning

confidence: 99%

Incidents Information Sharing Platform for Distributed Attack Detection

Fotiadou¹,

Velivassaki

Voulkidis³

et al. 2020

IEEE Open J. Commun. Soc.

View full text Add to dashboard Cite

Intrusion detection plays a critical role in cyber-security domain since malicious attacks cause irreparable damages to cyber-systems. In this work, we propose the I2SP prototype, which is a novel Information Sharing Platform, able to gather, pre-process, model, and distribute network-traffic information. Within the I2SP prototype we build several challenging deep feature learning models for network-traffic intrusion detection. The learnt representations will be utilized for classifying each new network measurement into its corresponding threat level. We evaluate our prototype's performance by conducting case studies using cyber-security data extracted from the Malware Information Sharing Platform (MISP)-API. To the best of our knowledge, we are the first that combine the MISP-API in order to construct an information sharing mechanism that supports multiple novel deep feature learning architectures for intrusion detection. Experimental results justify that the proposed deep feature learning techniques are able to predict accurately MISP threat-levels. INDEX TERMS Malware information sharing platform, network intrusion detection, anomaly detection, deep feature learning, convolutional neural networks, long-short memory neural networks, stacked-sparse autoencoders.

show abstract

“…As supervised approaches imply that both normal and anomalous observations are classified in the training dataset, and this collection may be difficult to obtain, the authors of References [ 20 , 21 ] propose hybrid semi-supervised anomaly detection models for high-dimensional datasets. In semi-supervised approaches, only normal samples are available in the training set; that is, the user cannot obtain information about anomalies.…”

Section: Background and Related Workmentioning

confidence: 99%

“…Unknown samples are classified as outliers when their behavior is far from that of the known normal samples. In Reference [ 20 ], they propose an anomaly detection model that consists of two components: a deep auto-encoder (DAE) and an ensemble KNN graphs-based anomaly detector, whose consuming time is a quadratic function,

. In Reference [ 21 ], their hybrid approach is based on k-means clustering and Sequential Minimal Optimization (SMO) classification, whose consuming time has a complexity of

.…”

Section: Background and Related Workmentioning

confidence: 99%

Towards Outlier Sensor Detection in Ambient Intelligent Platforms—A Low-Complexity Statistical Approach

Martín

Fuentes-Lorenzo

Bordel

et al. 2020

Sensors

View full text Add to dashboard Cite

Sensor networks in real-world environments, such as smart cities or ambient intelligent platforms, provide applications with large and heterogeneous sets of data streams. Outliers—observations that do not conform to an expected behavior—has then turned into a crucial task to establish and maintain secure and reliable databases in this kind of platforms. However, the procedures to obtain accurate models for erratic observations have to operate with low complexity in terms of storage and computational time, in order to attend the limited processing and storage capabilities of the sensor nodes in these environments. In this work, we analyze three binary classifiers based on three statistical prediction models—ARIMA (Auto-Regressive Integrated Moving Average), GAM (Generalized Additive Model), and LOESS (LOcal RegrESSion)—for outlier detection with low memory consumption and computational time rates. As a result, we provide (1) the best classifier and settings to detect outliers, based on the ARIMA model, and (2) two real-world classified datasets as ground truths for future research.

show abstract

A Hybrid Semi-Supervised Anomaly Detection Model for High-Dimensional Data

Cited by 98 publications

References 33 publications

Margin-Based Pareto Ensemble Pruning: An Ensemble Pruning Algorithm That Learns to Search Optimized Ensembles

Margin-Based Pareto Ensemble Pruning: An Ensemble Pruning Algorithm That Learns to Search Optimized Ensembles

Incidents Information Sharing Platform for Distributed Attack Detection

Towards Outlier Sensor Detection in Ambient Intelligent Platforms—A Low-Complexity Statistical Approach

Contact Info

Product

Resources

About