Anomaly Detection for Data Streams Based on Isolation Forest Using Scikit-Multiflow

Togbe, Maurras Ulbricht; Barry, Mariam; Boly, Aliou; Chabchoub, Yousra; Chiky, Raja; Montiel, Jacob; Tran, Vinh-Thuy

doi:10.1007/978-3-030-58811-3_2

Cited by 28 publications

(13 citation statements)

References 24 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…The major downside of online methods in general, especially unsupervised methods compared to their batch opponents, is the poorer performance when it comes to classifying abnormal and normal data instances. However, we strongly support the hypothesis in [11], when considering critical streaming applications as for detecting network-based malicious activity, a fast model, even with less accuracy, is preferred. However, applying FS shall at least improve the classification performance of OD methods [R-FS05].…”

Section: Requirements With Respect To Feature Selection For Outlier Detection On Streaming Datasupporting

confidence: 68%

Unsupervised Feature Selection for Outlier Detection on Streaming Data to Enhance Network Security

et al. 2021

View full text Add to dashboard Cite

Over the past couple of years, machine learning methods—especially the outlier detection ones—have anchored in the cybersecurity field to detect network-based anomalies rooted in novel attack patterns. However, the ubiquity of massive continuously generated data streams poses an enormous challenge to efficient detection schemes and demands fast, memory-constrained online algorithms that are capable to deal with concept drifts. Feature selection plays an important role when it comes to improve outlier detection in terms of identifying noisy data that contain irrelevant or redundant features. State-of-the-art work either focuses on unsupervised feature selection for data streams or (offline) outlier detection. Substantial requirements to combine both fields are derived and compared with existing approaches. The comprehensive review reveals a research gap in unsupervised feature selection for the improvement of outlier detection methods in data streams. Thus, a novel algorithm for Unsupervised Feature Selection for Streaming Outlier Detection, denoted as UFSSOD, will be proposed, which is able to perform unsupervised feature selection for the purpose of outlier detection on streaming data. Furthermore, it is able to determine the amount of top-performing features by clustering their score values. A generic concept that shows two application scenarios of UFSSOD in conjunction with off-the-shell online outlier detection algorithms has been derived. Extensive experiments have shown that a promising feature selection mechanism for streaming data is not applicable in the field of outlier detection. Moreover, UFSSOD, as an online capable algorithm, yields comparable results to a state-of-the-art offline method trimmed for outlier detection.

show abstract

Section: Requirements With Respect To Feature Selection For Outlier Detection On Streaming Datasupporting

confidence: 68%

Unsupervised Feature Selection for Outlier Detection on Streaming Data to Enhance Network Security

et al. 2021

View full text Add to dashboard Cite

show abstract

“…Furthermore, for the sake of reducing the impact of FP, the pruning algorithm might also filter out single-stage attacks. We deem this step as critical since we support and transfer the statement in [7] for OD-based alert correlation that, especially for critical streaming applications, it is more important not to miss critical TP anomalies forming a single-stage attack while accepting a certain rate of FP.…”

Section: Delimitation From Soaaprmentioning

confidence: 91%

“…, which is often used as a metric on imbalanced data [7]. The effects of FPs and FNs on the clustering result are exemplarily discussed for the Bot attack scenario for which both SOAAPR and GAC achieved good results, and the number of alerts is more meaningful compared to Infiltration.…”

Section: Soaapr Clusteringmentioning

confidence: 99%

“…With online OD algorithms, alerts can be generated in a streaming manner and lead to a dynamic, huge, infinite and fast changing alert stream for which conventional offline alert correlation methods are not designed [6]. Thus, similar to the statement in [7], when it comes to some critical streaming applications, whereby a fast but less accurate OD model is preferred, we strongly support the claim by [6] that it is more significant to detect an on-going attack in a timely manner than analyzing it afterwards in an offline fashion. Detecting attacks at an early stage significantly reduces damage since, even when applying advanced detection systems, sophisticated attackers can nest undetected for up to 100 days [8].…”

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

Exploiting the Outcome of Outlier Detection for Novel Attack Pattern Recognition on Streaming Data

et al. 2021

View full text Add to dashboard Cite

Future-oriented networking infrastructures are characterized by highly dynamic Streaming Data (SD) whose volume, speed and number of dimensions increased significantly over the past couple of years, energized by trends such as Software-Defined Networking or Artificial Intelligence. As an essential core component of network security, Intrusion Detection Systems (IDS) help to uncover malicious activity. In particular, consecutively applied alert correlation methods can aid in mining attack patterns based on the alerts generated by IDS. However, most of the existing methods lack the functionality to deal with SD data affected by the phenomenon called concept drift and are mainly designed to operate on the output from signature-based IDS. Although unsupervised Outlier Detection (OD) methods have the ability to detect yet unknown attacks, most of the alert correlation methods cannot handle the outcome of such anomaly-based IDS. In this paper, we introduce a novel framework called Streaming Outlier Analysis and Attack Pattern Recognition, denoted as SOAAPR, which is able to process the output of various online unsupervised OD methods in a streaming fashion to extract information about novel attack patterns. Three different privacy-preserving, fingerprint-like signatures are computed from the clustered set of correlated alerts by SOAAPR, which characterizes and represents the potential attack scenarios with respect to their communication relations, their manifestation in the data's features and their temporal behavior. Beyond the recognition of known attacks, comparing derived signatures, they can be leveraged to find similarities between yet unknown and novel attack patterns. The evaluation, which is split into two parts, takes advantage of attack scenarios from the widely-used and popular CICIDS2017 and CSE‐CIC‐IDS2018 datasets. Firstly, the streaming alert correlation capability is evaluated on CICIDS2017 and compared to a state-of-the-art offline algorithm, called Graph-based Alert Correlation (GAC), which has the potential to deal with the outcome of anomaly-based IDS. Secondly, the three types of signatures are computed from attack scenarios in the datasets and compared to each other. The discussion of results, on the one hand, shows that SOAAPR can compete with GAC in terms of alert correlation capability leveraging four different metrics and outperforms it significantly in terms of processing time by an average factor of 70 in 11 attack scenarios. On the other hand, in most cases, all three types of signatures seem to reliably characterize attack scenarios such that similar ones are grouped together, with up to 99.05\% similarity between the FTP and SSH Patator attack.intrusion detection; alert analysis; alert correlation; outlier detection; attack scenario; streaming data; network security

show abstract

“…In our approach we proposed to use ML based algorithm that firstly learns how analyzed metrics behave in normal state and then is able to find anomaly behavior of time series. We tested 3 different algorithms with different parameters [4][5] [6]. The purpose of the algorithm is to load data from metrics in order to detect anomalies.…”

Section: Metrics Anomaly Detectionmentioning

confidence: 99%

The Support System for Anomaly Detection with Application in Mainframe Management Process

Strzałka¹,

Gerka²,

Kuraś³

et al. 2021

Frontiers in Artificial Intelligence and Applications

View full text Add to dashboard Cite

The process of mainframe machines managing and administration requires not only specialized expert knowledge based on many years of experience but also on appropriate tools provided by a machine performance management system, e.g. the Resource Measurement Facility (RMF). The aim of this paper is to show some preliminary results of Z-RAYS system construction that is built basing on machine learning (ML) techniques. It allows automatic detection of anomalies and generation of early warnings about some errors that can appear in the mainframe to support mainframe management process. Presented results are based on extensive simulations that were done basing on the IBM emulator. We focus on determining the degree of the metrics variability, the degree of the data repeatability in metrics, some approaches in metrics anomaly detection and solutions for event correlation detection in metrics.

show abstract

Anomaly Detection for Data Streams Based on Isolation Forest Using Scikit-Multiflow

Cited by 28 publications

References 24 publications

Unsupervised Feature Selection for Outlier Detection on Streaming Data to Enhance Network Security

Unsupervised Feature Selection for Outlier Detection on Streaming Data to Enhance Network Security

Exploiting the Outcome of Outlier Detection for Novel Attack Pattern Recognition on Streaming Data

The Support System for Anomaly Detection with Application in Mainframe Management Process

Contact Info

Product

Resources

About