Proceedings of the Ninth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining - KDD '03 2003
DOI: 10.1145/956755.956758
|View full text |Cite
|
Sign up to set email alerts
|

Mining distance-based outliers in near linear time with randomization and a simple pruning rule

Abstract: Defining outliers by their distance to neighboring examples is a popular approach to finding unusual examples in a data set. Recently, much work has been conducted with the goal of finding fast algorithms for this task. We show that a simple nested loop algorithm that in the worst case is quadratic can give near linear time performance when the data is in random order and a simple pruning rule is used. We test our algorithm on real high-dimensional data sets with millions of examples and show that the near lin… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
4
1

Citation Types

0
102
0

Year Published

2009
2009
2019
2019

Publication Types

Select...
3
3

Relationship

0
6

Authors

Journals

citations
Cited by 195 publications
(102 citation statements)
references
References 0 publications
0
102
0
Order By: Relevance
“…This gives a summary classification of all existing detection techniques. -We demonstrate a huge improvement in execution time by using multiple pruning rules in two phases, compared with outstanding existing nested-loop distance-based methods, ORCA [11] and RBRP [12]. Since ORCA, RBRP and MIRO use the same notion of outlier (Section 2), outliers identified by the three techniques are exactly the same.…”
Section: Introductionmentioning
confidence: 93%
See 4 more Smart Citations
“…This gives a summary classification of all existing detection techniques. -We demonstrate a huge improvement in execution time by using multiple pruning rules in two phases, compared with outstanding existing nested-loop distance-based methods, ORCA [11] and RBRP [12]. Since ORCA, RBRP and MIRO use the same notion of outlier (Section 2), outliers identified by the three techniques are exactly the same.…”
Section: Introductionmentioning
confidence: 93%
“…Here two pruning rules are utilized: a) first triangular inequality on the data point's outlier score is used, and then b) the outlier score is compared with the minimum score required to be an outlier. The second check is similar to that of ORCA [11]. However, while ORCA starts with a cutoff of 0, in MIRO the initial cutoff is obtained from the first phase, and hence converges faster.…”
Section: Introductionmentioning
confidence: 93%
See 3 more Smart Citations