2009
DOI: 10.1007/s10618-009-0148-z
|View full text |Cite
|
Sign up to set email alerts
|

A fast outlier detection strategy for distributed high-dimensional data sets with mixed attributes

Abstract: Outlier detection has attracted substantial attention in many applications and research areas; some of the most prominent applications are network intrusion detection or credit card fraud detection. Many of the existing approaches are based on calculating distances among the points in the dataset. These approaches cannot easily adapt to current datasets that usually contain a mix of categorical and continuous attributes, and may be distributed among different geographical locations. In addition, current datase… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

0
38
0

Year Published

2013
2013
2024
2024

Publication Types

Select...
5
2

Relationship

0
7

Authors

Journals

citations
Cited by 109 publications
(45 citation statements)
references
References 30 publications
0
38
0
Order By: Relevance
“…Most existing categorical data oriented methods are based on a general assumption that anomalies lie in regions of low frequency (Akoglu et al, 2012;Ghoting, Otey, & Parthasarathy, 2004;He et al, 2005;Koufakou, Ortiz, Georgiopoulos, Anagnostopoulos, & Reynolds, 2007;Koufakou & Georgiopoulos, 2010;Smets & Vreeken, 2011;He, Deng, Xu, & Huang, 2006). Typical examples are frequent patterns based methods FPOF (He et al, 2005) and infrequent patterns based methods LOADED (Ghoting et al, 2004).…”
Section: Methods For Categorical Datamentioning
confidence: 99%
See 1 more Smart Citation
“…Most existing categorical data oriented methods are based on a general assumption that anomalies lie in regions of low frequency (Akoglu et al, 2012;Ghoting, Otey, & Parthasarathy, 2004;He et al, 2005;Koufakou, Ortiz, Georgiopoulos, Anagnostopoulos, & Reynolds, 2007;Koufakou & Georgiopoulos, 2010;Smets & Vreeken, 2011;He, Deng, Xu, & Huang, 2006). Typical examples are frequent patterns based methods FPOF (He et al, 2005) and infrequent patterns based methods LOADED (Ghoting et al, 2004).…”
Section: Methods For Categorical Datamentioning
confidence: 99%
“…Some computationally expensive algorithms (Ghoting et al, 2004;Koufakou & Georgiopoulos, 2010) require to use parallelism in order to reduce their runtime. But parallelism does not reduce the time complexity of the base algorithm.…”
Section: Methods For Categorical Datamentioning
confidence: 99%
“…The lower side of the grey area is determined based on the 2-segment line (line 2). Line 2 is expressed as (11).…”
Section: Characteristics Of the Outliers From Tooth Profilesmentioning
confidence: 99%
“…Gear measuring center is a kind of efficient precision measuring instrument, but outliers easily cause sudden measurement environment changes in the measurement of tooth profiles, such as the effect of mechanical vibration on the measuring head and the impact of the sharp current on the data acquisition card [6][7][8]. The outliers lead to the abnormal data and decrease the measurement precision of tooth profiles, so the outliers must be removed as soon as they are detected [9][10][11].…”
Section: Introductionmentioning
confidence: 99%
“…For regression problems, it can be very difficult to spot noise or outliers in the data without careful investigation and even harder in multivariate data sets with both categorical and numerical features [13]. Overfitting avoidance, known as the node pruning in the context of regression trees, is a general way to allow robustness for unseen data by penalizing the tree for being too complex.…”
Section: Introductionmentioning
confidence: 99%