Raja Chiky scite author profile

Detecting anomalies in streaming data is an important issue for many application domains, such as cybersecurity, natural disasters, or bank frauds. Different approaches have been designed in order to detect anomalies: statistics-based, isolation-based, clustering-based, etc. In this paper, we present a structured survey of the existing anomaly detection methods for data streams with a deep view on Isolation Forest (iForest). We first provide an implementation of Isolation Forest Anomalies detection in Stream Data (IForestASD), a variant of iForest for data streams. This implementation is built on top of scikit-multiflow (River), which is an open source machine learning framework for data streams containing a single anomaly detection algorithm in data streams, called Streaming half-space trees. We performed experiments on different real and well known data sets in order to compare the performance of our implementation of IForestASD and half-space trees. Moreover, we extended the IForestASD algorithm to handle drifting data by proposing three algorithms that involve two main well known drift detection methods: ADWIN and KSWIN. ADWIN is an adaptive sliding window algorithm for detecting change in a data stream. KSWIN is a more recent method and it refers to the Kolmogorov–Smirnov Windowing method for concept drift detection. More precisely, we extended KSWIN to be able to deal with n-dimensional data streams. We validated and compared all of the proposed methods on both real and synthetic data sets. In particular, we evaluated the F1-score, the execution time, and the memory consumption. The experiments show that our extensions have lower resource consumption than the original version of IForestASD with a similar or better detection efficiency.

show abstract

Anomaly Detection for Data Streams Based on Isolation Forest Using Scikit-Multiflow

Togbe

Barry

Boly

et al. 2020

View full text Add to dashboard Cite

Detecting anomalies in streaming data is an important issue in a variety of real-word applications as it provides some critical information, e.g., Cyber security attacks, Fraud detection or others real-time applications. Different approaches have been designed in order to detect anomalies: statistics-based, isolation-based, clustering-based. In this paper, we present a quick survey of the existing anomaly detection methods for data streams. We focus on Isolation Forest (iForest), a state-of-theart method for anomaly detection. We provide the implementation of IForestASD, a variant of iForest for data streams. This implementation is built on top of scikit-multiflow, an open source machine learning framework for data streams. In fact, few anomalies detection methods are provided in the well-known data streams mining frameworks such as MOA or StreamDM. Hence, we extend scikitmultiflow providing an additional tool. We performed experiments on 3 real-world data sets to evaluate predictive performance and resource consumption (memory and time) of IForestASD and compare it with a well known and state-of-the-art anomaly detection algorithm for data streams called Half-Space Trees.

show abstract

How can sliding HyperLogLog and EWMA detect port scan attacks in IP traffic?

Chabchoub

Chiky

Dogan

2014

EURASIP J. on Info. Security

View full text Add to dashboard Cite

IP networks are constantly targeted by new techniques of denial of service attacks (SYN flooding, port scan, UDP flooding, etc), causing service disruption and considerable financial damage. The on-line detection of DoS attacks in the current high-bit rate IP traffic is a big challenge. We propose in this paper an on-line algorithm for port scan detection. It is composed of two complementary parts: First, a probabilistic counting part, where the number of distinct destination ports is estimated by adapting a method called 'sliding HyperLogLog' to the context of port scan in IP traffic. Second, a decisional mechanism is performed on the estimated number of destination ports in order to detect in real time any behavior that could be related to a malicious traffic. This latter part is mainly based on the exponentially weighted moving average algorithm (EWMA) that we adapted to the context of on-line analysis by adding a learning step (supposed without attacks) and improving its update mechanism. The obtained port scan detecting method is tested against real IP traffic containing some attacks. It detects all the port scan attacks within a very short time response (of about 30 s) and without any false positive. The algorithm uses a very small total memory of less than 22 kb and has a very good accuracy on the estimation of the number of destination ports (a relative error of about 3.25%), which is in agreement with the theoretical bounds provided by the sliding HyperLogLog algorithm.

show abstract

An In-Depth Study and Improvement of Isolation Forest

et al. 2022

View full text Add to dashboard Cite

Historically, anomalies detection was an important issue for industrial applications such as the detection of a manufacturing failure or defect. It is still a current topic that tries to meet the ever increasing demand in different fields such as intrusion detection, fraud detection, ecosystem change detection or event detection in sensor networks. That's why anomalies detection remains a research topic of great interest for various research communities. In this paper, we focused on Isolation Forest (IForest), a well known, efficient anomalies detection algorithm. We provided a deep and complete view on IForest. We evaluated the impact of its input parameters (number of trees, sample size and decision threshold) on the efficiency of the detection and on the execution time. We discussed the benefit of including some anomalies into the training phase. To address the limits of IForest, we performed different experiments on commonly used real datasets and also on synthetic datasets with non trivial distributions. We designed multidimensional datasets where anomalies are carried by several dimensions simultaneously. Moreover, we used a varying density and distance between anomalies and normal data, for a variable similarity between these two data classes. We compared the performance of IForest against its improved version called Extended IForest. Finally, we designed and validated a new extension of IForest, based on the different individual trees decisions instead of a global forest decision that we call Majority Voting IForest (MVIForest). The experiments show that MVIForest has a shorter execution time than IForest, with almost the same accuracy.

show abstract

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Raja Chiky

Personalized recommender system for e-Learning environment

Anomalies Detection Using Isolation in Concept-Drifting Data Streams

Anomaly Detection for Data Streams Based on Isolation Forest Using Scikit-Multiflow

How can sliding HyperLogLog and EWMA detect port scan attacks in IP traffic?

An In-Depth Study and Improvement of Isolation Forest

Contact Info

Product

Resources

About