Mining top-n local outliers in large databases

Wen, Jie; Tung, Anthony K. H.; Han, Jiawei

doi:10.1145/502512.502554

Cited by 278 publications

(135 citation statements)

References 5 publications

Supporting

Mentioning

130

Contrasting

Unclassified

Order By: Relevance

“…Note that LOF ranks points by only considering the neighborhood density of the points, thus it may miss the potential outliers whose densities are close to those of their neighbors. [12] improves the efficiency of algorithm in [7] by proposing an efficient micro-cluster-based local outlier mining algorithm, but it still use LOF to mine outliers in dataset.…”

Section: Related Workmentioning

confidence: 99%

See 1 more Smart Citation

Detecting outlying subspaces for high-dimensional data: the new task, algorithms, and performance

2006

View full text Add to dashboard Cite

Outlier detection is a fundamental step in knowledge discovery in databases. With the increasing number of high-dimensional databases, existing outlier detection algorithms that work only in the context of full space are unable to effectively screen out informative outliers. This is because majority of these outliers exists only in subspaces. In this paper, we identify a new outlier detection task for high-dimensional data, i.e. finding the subspaces in which given points are outliers, and propose a novel outlier detection algorithm, called High-D Outlier Detection (HighDOD). The intuitive idea is that we measure the outlying degree of the point using the sum of distances between this point and its k nearest neighbors. Two pruning strategies are proposed to realize fast pruning in the subspace search and an efficient dynamic subspace search method with a sample-based learning process has been implemented. Experimental results show that HighDOD is efficient and outperforms the naive top-down, bottom-up and random search methods.

show abstract

Section: Related Workmentioning

confidence: 99%

“…They can broadly be divided into distance-based methods [13,14,18] and local density-based methods [7,12]. However, many of these outlier detection algorithms are unable to deal with high-dimensional datasets efficiently as many of them only consider outliers in the entire space.…”

Section: Introductionmentioning

confidence: 99%

Detecting outlying subspaces for high-dimensional data: the new task, algorithms, and performance

2006

View full text Add to dashboard Cite

show abstract

“…Because LOF ranks points only considering the neighborhood density of the points, thus it may miss the potential outliers whose densities are close to those of their neighbors. [JTH01] improved the efficiency of algorithm of [BKNS00] by proposing an efficient micro-cluster-based local outlier mining algorithm, but it still use LOF to mine outliers in dataset.…”

Section: Related Workmentioning

confidence: 99%

“…In this experiment, we plot the execution time of Grid-ODF against micro-cluster-based LOF [JTH01] and partition-based KNN-distance [RRK00] in Figure 8. Because we are only interested in studying the efficiency of Grid-ODF in detecting outliers, so the time spent in the iterative adaptation of cell partition is not included in this experiment.…”

Section: Efficiency Evaluationmentioning

confidence: 99%

“…Recently, there have been numerous research work in outlier detection and the new notions such as distance-based outliers [KN98, KN99, RRK00] and density-based local outliers [BKNS00,JTH01] have been proposed in this field. However, the existing outlier detection algorithms suffer the drawbacks that they are inefficient in dealing with large multi-dimensional datasets, and most of them are only able to capture certain kinds of outliers.…”

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

Grid-ODF: Detecting Outliers Effectively and Efficiently in Large Multi-dimensional Databases

Wang

Zhang

Wang

2005

Computational Intelligence and Security

View full text Add to dashboard Cite

Abstract:Outlier detection is an important task in data mining that enjoys a wide range of applications such as detections of credit card fraud, criminal activity and exceptional patterns in databases. In recent years, there have been numerous research work in outlier detection and the new notions such as distance-based outliers and density-based local outliers have been proposed. However, the existing outlier detection algorithms suffer the drawbacks that they are inefficient in dealing with large multi-dimensional datasets and most of them are only able to capture certain kinds of outliers. In this paper, we will propose a novel outlier mining algorithm, called Grid-ODF, that takes into account both the local and global perspectives of outliers for effective detection. The notion of Outlying Degree Factor (ODF), that reflects the factors of both the density and distance, is introduced to rank outliers. A grid structure partitioning the data space is employed to enable Grid-ODF to be implemented efficiently. Experimental results show that Grid-ODF outperforms existing outlier detection algorithms such as LOF and KNN-distance in terms of effectiveness and efficiency.

show abstract

Large dataset summarization with automatic parameter optimization and parallel processing for local outlier detection

Shou

2018

Concurrency and Computation

View full text Add to dashboard Cite

Summary As one of the most important research problems of data analytics and data mining, outlier detection from large datasets has drawn many research attentions in recent years. In this paper, we investigate the interesting research problem of summarizing large datasets for supporting efficient local outlier detection. To summarize large datasets, efficient summarization algorithms are proposed that produce a highly compact summary of the original dataset, which can be applied to detect local outliers from future similar datasets. A novel automatic parameter optimization method is proposed to produce the optimal setup of the key parameters used in the summarization algorithm. Parallel processing methods are also proposed to accelerate significantly the summarization process. The experimental evaluation results demonstrate that our proposed algorithms are highly scalable for large datasets and effective in producing dataset summary for local outlier detection.

show abstract

Mining top-n local outliers in large databases

Cited by 278 publications

References 5 publications

Detecting outlying subspaces for high-dimensional data: the new task, algorithms, and performance

Detecting outlying subspaces for high-dimensional data: the new task, algorithms, and performance

Grid-ODF: Detecting Outliers Effectively and Efficiently in Large Multi-dimensional Databases

Large dataset summarization with automatic parameter optimization and parallel processing for local outlier detection

Contact Info

Product

Resources

About