2006
DOI: 10.1007/s10618-005-0014-6
|View full text |Cite
|
Sign up to set email alerts
|

Fast Distributed Outlier Detection in Mixed-Attribute Data Sets

Abstract: Efficiently detecting outliers or anomalies is an important problem in many areas of science, medicine and information technology. Applications range from data cleaning to clinical diagnosis, from detecting anomalous defects in materials to fraud and intrusion detection. Over the past decade, researchers in data mining and statistics have addressed the problem of outlier detection using both parametric and non-parametric approaches in a centralized setting. However, there are still several challenges that must… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
128
0

Year Published

2010
2010
2018
2018

Publication Types

Select...
4
4
1

Relationship

0
9

Authors

Journals

citations
Cited by 195 publications
(128 citation statements)
references
References 19 publications
0
128
0
Order By: Relevance
“…In most applications, there is a combination of both continuous and categorical values. There are approaches that can combine the similarity of continuous attributes with similarity of categorical ones [Otey et al, 2006] [Tan et al, 2005]. Proximitybased methods can be classified into two groups: distance-based and densitybased methods.…”
Section: Proximity-based Methodsmentioning
confidence: 99%
“…In most applications, there is a combination of both continuous and categorical values. There are approaches that can combine the similarity of continuous attributes with similarity of categorical ones [Otey et al, 2006] [Tan et al, 2005]. Proximitybased methods can be classified into two groups: distance-based and densitybased methods.…”
Section: Proximity-based Methodsmentioning
confidence: 99%
“…Other variants have been proposed for categorical attributes or a mixture of categorical and continuous attributes. Otey et al defined the anomaly score as the inverse of the sum of the link strength between the instance and the other instance in data sets [8]. The associated link strength is equal to the number of attribute-value pairs shared between two instances.…”
Section: A Definition Of Anomaly Scorementioning
confidence: 99%
“…Otey et al presented a tunable algorithm for distributed anomaly detection in mixed-attribute data sets [8]. They capture the link between the points in the mixed categorical and continuous attribute space.…”
Section: B Distance/similarity Measurementioning
confidence: 99%
“…Statistics-based approaches (see [2,3]) were first used for outlier detection based on an assumption that the distributions of datasets are known. A data point was defined as an outlier if it deviates from the existing distribution.…”
Section: Introductionmentioning
confidence: 99%