On-Line Unsupervised Outlier Detection Using Finite Mixtures with Discounting Learning Algorithms

Yamanishi, Kenji; Takeuchi, Jun’ichi; Williams, Graham J.; Milne, Peter

doi:10.1023/b:dami.0000023676.72185.7c

Cited by 410 publications

(254 citation statements)

References 21 publications

Supporting

Mentioning

254

Contrasting

Order By: Relevance

“…Data sets included are: a data generator Mulcross 4 [23] which is designed to evaluate anomaly detectors, and three other anomaly detection data sets from UCI repository [4]: http, Annthyroid and Dermatology. Previous usage can be found in [28,23,15]. Http is the largest subset from KDD CUP 99 network intrusion data [28]; attack instances are treated as anomalies.…”

Section: Performance On Data Sets Containing Only Clustered Anomaliesmentioning

confidence: 99%

“…A publicly available example of clustered anomalies can be found in KDDCUP 1999 data set 1 , where bursts of attacks (clustered anomalies) can be observed in a subset known as http [28] as shown in Figure 1. Three bursts of attacks are clustered, first in the middle of the data stream; and two smaller ones appeared at the end of the stream.…”

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

On Detecting Clustered Anomalies Using SCiForest

Liu

Ting

Zhou

2010

Machine Learning and Knowledge Discovery in Databases

View full text Add to dashboard Cite

Abstract. Detecting local clustered anomalies is an intricate problem for many existing anomaly detection methods. Distance-based and density-based methods are inherently restricted by their basic assumptions-anomalies are either far from normal points or being sparse. Clustered anomalies are able to avoid detection since they defy these assumptions by being dense and, in many cases, in close proximity to normal instances. In this paper, without using any density or distance measure, we propose a new method called SCiForest to detect clustered anomalies. SCiForest separates clustered anomalies from normal points effectively even when clustered anomalies are very close to normal points. It maintains the ability of existing methods to detect scattered anomalies, and it has superior time and space complexities against existing distance-based and density-based methods.

show abstract

Section: Performance On Data Sets Containing Only Clustered Anomaliesmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

On Detecting Clustered Anomalies Using SCiForest

Liu

Ting

Zhou

2010

Machine Learning and Knowledge Discovery in Databases

View full text Add to dashboard Cite

show abstract

“…This approach is used in [13], where the authors describe an algorithm called SmartSifter that uses this idea to calculate outlier scores for each instance. The outlier score of an instance is defined as the Hellinger distance between two probability distributions of available data: one built for the whole data set and the other one built for the whole data set without the observed instance.…”

Section: Related Workmentioning

confidence: 99%

An Interactive Approach to Outlier Detection

Konijn

Kowalczyk

2010

Lecture Notes in Computer Science

View full text Add to dashboard Cite

Abstract. In this paper we describe an interactive approach for finding outliers in big sets of records, such as collected by banks, insurance companies, web shops. The key idea behind our approach is the usage of an easy-to-compute and easy-to-interpret outlier score function. This function is used to identify a set of potential outliers. The outliers, organized in clusters, are then presented to a domain expert, together with some context information, such as characteristics of clusters and distribution of scores. Consequently, they are analyzed, labelled as non-explainable or explainable, and removed from the data. The whole process is iterated several times, until no more interesting outliers can be found.

show abstract

“…In [7], numerous discordancy tests are discussed for different scenarios. In [9] [10], authors propose SmartSifter (SS), which is an on-line real-time outlier detection algorithm. The basic principle of SS is to use a probabilistic model (a finite mixture model) to represent the underlying distribution of a given data set.…”

Section: Related Workmentioning

confidence: 99%

Synchronization Based Outlier Detection

Shao¹,

Böhm²,

Yang

et al. 2010

Machine Learning and Knowledge Discovery in Databases

View full text Add to dashboard Cite

Abstract. The study of extraordinary observations is of great interest in a large variety of applications, such as criminal activities detection, athlete performance analysis, and rare events or exceptions identification. The question is: how can we naturally flag these outliers in a real complex data set? In this paper, we study outlier detection based on a novel powerful concept: synchronization. The basic idea is to regard each data object as a phase oscillator and simulate its dynamical behavior over time according to an extensive Kuramoto model. During the process towards synchronization, regular objects and outliers exhibit different interaction patterns. Outlier objects are naturally detected by local synchronization factor (LSF). An extensive experimental evaluation on synthetic and real world data demonstrates the benefits of our method.

show abstract

On-Line Unsupervised Outlier Detection Using Finite Mixtures with Discounting Learning Algorithms

Cited by 410 publications

References 21 publications

On Detecting Clustered Anomalies Using SCiForest

On Detecting Clustered Anomalies Using SCiForest

An Interactive Approach to Outlier Detection

Synchronization Based Outlier Detection

Contact Info

Product

Resources

About