Proceedings of the ACM SIGKDD Workshop on Outlier Detection and Description 2013
DOI: 10.1145/2500853.2500858
|View full text |Cite
|
Sign up to set email alerts
|

Systematic construction of anomaly detection benchmarks from real data

Abstract: Research in anomaly detection suffers from a lack of realistic and publicly-available problem sets. This paper discusses what properties such problem sets should possess. It then introduces a methodology for transforming existing classification data sets into ground-truthed benchmark data sets for anomaly detection. The methodology produces data sets that vary along three important dimensions: (a) point difficulty, (b) relative frequency of anomalies, and (c) clusteredness. We apply our generated datasets to b… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

3
101
0
10

Year Published

2016
2016
2020
2020

Publication Types

Select...
3
3
1

Relationship

0
7

Authors

Journals

citations
Cited by 111 publications
(114 citation statements)
references
References 32 publications
3
101
0
10
Order By: Relevance
“…So far, we have concentrated on classification and regression tasks. There are methods to derive clustering and outlier detection benchmarks from classification and regression datasets [4,5], so that extending the dataset collection for such unsupervised tasks is possible as well. Furthermore, as many datasets on the Semantic Web use extensive hierarchies in the form of ontologies, building benchmark datasets for tasks like hierarchical multi-label classification [15] would also be an interesting extension.…”
Section: Discussionmentioning
confidence: 99%
See 1 more Smart Citation
“…So far, we have concentrated on classification and regression tasks. There are methods to derive clustering and outlier detection benchmarks from classification and regression datasets [4,5], so that extending the dataset collection for such unsupervised tasks is possible as well. Furthermore, as many datasets on the Semantic Web use extensive hierarchies in the form of ontologies, building benchmark datasets for tasks like hierarchical multi-label classification [15] would also be an interesting extension.…”
Section: Discussionmentioning
confidence: 99%
“…Notable examples include the Ontology Alignment Evaluation Initiative (OAEI) for ontology matching 4 , the Berlin SPARQL Benchmark 5 for triple store performance, the Lehigh University Benchmark (LUBM) 6 for reasoning, or the Question Answering over Linked Data (QALD) dataset 7 for natural language query systems. In this paper, we introduce a collection of datasets for benchmarking machine learning approaches for the Semantic Web.…”
Section: Introductionmentioning
confidence: 99%
“…De acordo com as considerações apresentadas na Seção 3.1, para criação de uma boa coleção benchmark para avaliação de algoritmos de detecção não supervisionada de outliers, foi proposto por Emmott et al (2013) uma metodologia para transformar bases de dados existentes nas áreas de classificação e regressão em bases para detecção de anomalias. Para isso, os autores definiram quatro requisitos a serem respeitados.…”
Section: Propostas Na Literaturaunclassified
“…Nesse trabalho recentemente publicado, Emmott et al (2013) pré-processam diversas bases de dados recolhidas do repositório UCI (Frank and Asuncion, 2010) para construir uma coleção benchmark para detecção de outliers. Porém, ao transformar as bases de classificação biná-ria coletadas para o contexto de detecção de anomalias, Emmott et al (2013) escolheram uma classe "normal" e uma classe "anômala".…”
Section: Propostas Na Literaturaunclassified
See 1 more Smart Citation