Systematic construction of anomaly detection benchmarks from real data

Emmott, Andrew; Das, Shubhomoy; Dietterich, Thomas G.; Fern, Alan; Wong, Weng‐Keen

doi:10.1145/2500853.2500858

Cited by 111 publications

(114 citation statements)

References 32 publications

Supporting

Mentioning

101

Contrasting

Unclassified

Order By: Relevance

“…So far, we have concentrated on classification and regression tasks. There are methods to derive clustering and outlier detection benchmarks from classification and regression datasets [4,5], so that extending the dataset collection for such unsupervised tasks is possible as well. Furthermore, as many datasets on the Semantic Web use extensive hierarchies in the form of ontologies, building benchmark datasets for tasks like hierarchical multi-label classification [15] would also be an interesting extension.…”

Section: Discussionmentioning

confidence: 99%

“…Notable examples include the Ontology Alignment Evaluation Initiative (OAEI) for ontology matching 4 , the Berlin SPARQL Benchmark 5 for triple store performance, the Lehigh University Benchmark (LUBM) 6 for reasoning, or the Question Answering over Linked Data (QALD) dataset 7 for natural language query systems. In this paper, we introduce a collection of datasets for benchmarking machine learning approaches for the Semantic Web.…”

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

A Collection of Benchmark Datasets for Systematic Evaluations of Machine Learning on the Semantic Web

Ristoski

Vries²,

Paulheim

2016

Lecture Notes in Computer Science

View full text Add to dashboard Cite

Abstract.Resource type: Datasets Permanent URL: http://w3id.org/sw4ml-datasets In the recent years, several approaches for machine learning on the Semantic Web have been proposed. However, no extensive comparisons between those approaches have been undertaken, in particular due to a lack of publicly available, acknowledged benchmark datasets. In this paper, we present a collection of 22 benchmark datasets of different sizes.Such a collection of datasets can be used to conduct quantitative performance testing and systematic comparisons of approaches.

show abstract

Section: Discussionmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

A Collection of Benchmark Datasets for Systematic Evaluations of Machine Learning on the Semantic Web

Ristoski

Vries²,

Paulheim

2016

Lecture Notes in Computer Science

View full text Add to dashboard Cite

show abstract

“…De acordo com as considerações apresentadas na Seção 3.1, para criação de uma boa coleção benchmark para avaliação de algoritmos de detecção não supervisionada de outliers, foi proposto por Emmott et al (2013) uma metodologia para transformar bases de dados existentes nas áreas de classificação e regressão em bases para detecção de anomalias. Para isso, os autores definiram quatro requisitos a serem respeitados.…”

Section: Propostas Na Literaturaunclassified

“…Nesse trabalho recentemente publicado, Emmott et al (2013) pré-processam diversas bases de dados recolhidas do repositório UCI (Frank and Asuncion, 2010) para construir uma coleção benchmark para detecção de outliers. Porém, ao transformar as bases de classificação biná-ria coletadas para o contexto de detecção de anomalias, Emmott et al (2013) escolheram uma classe "normal" e uma classe "anômala".…”

Section: Propostas Na Literaturaunclassified

“…Porém, ao transformar as bases de classificação biná-ria coletadas para o contexto de detecção de anomalias, Emmott et al (2013) escolheram uma classe "normal" e uma classe "anômala". Essa atribuição arbitrária vai contra questões ditas anteriormente neste mesmo capítulo, em que ao realizar essa atribuição perde-se, em muitos casos, a semântica do conceito de outlier.…”

Section: Propostas Na Literaturaunclassified

See 1 more Smart Citation

Estudo, avaliação e comparação de técnicas de detecção não supervisionada de outliers

Campos¹

View full text Add to dashboard Cite

GraphBAD: A general technique for anomaly detection in security information and event management

Parkinson

Vallati

Crampton

et al. 2018

Concurrency and Computation

View full text Add to dashboard Cite

The reliance on expert knowledge-required for analysing security logs and performing security audits-has created an unhealthy balance, where many computer users are not able to correctly audit their security configurations and react to potential security threats. The decreasing cost of IT and the increasing use of technology in domestic life are exacerbating this problem, where small companies and home IT users are not able to afford the price of experts for auditing their system configuration. In this paper, we present GraphBAD, a graph-based analysis tool that is able to analyse security configurations in order to identify anomalies that could lead to potential security risks. GraphBAD, which does not require any prior domain knowledge, generates graph-based models from security configuration data and, by analysing such models, is able to propose mitigation plans that can help computer users in increasing the security of their systems. A large experimental analysis, conducted on both publicly available (the well-known KDD dataset) and synthetically generated testing sets (file system permissions), demonstrates the ability of GraphBAD in correctly identifying security configuration anomalies and suggesting appropriate mitigation plans. KEYWORDSanomaly detection, graph structure, log files, security auditing, SIEM INTRODUCTIONAuditing security configurations is the process of searching for anomalies that potentially expose a vulnerability of a considered system. The term Security Information and Event Management (SIEM) is often used to describe the process of monitoring audit logs and security configurations to identify vulnerabilities. Given the complexity of the task, there is a heavy reliance on expert knowledge, which is required for understanding the different security configurations. This reliance has two main drawbacks: first, expert knowledge can be incomplete or erroneous; second, the high cost of security experts makes it infeasible for many users to correctly configure the security of their Information Technology (IT) systems. Therefore, in many cases, unseen weaknesses, which can be exploited, will remain in the configuration. Clearly, auditing security configurations is a critical problem for organisations that significantly rely on their IT infrastructure for undertaking business. For this reason, many companies frequently employ a third party security auditing company to examine their IT infrastructure to identify weaknesses and suggest suitable mitigation plans (named as penetration testing 1,2 ). Although businesses are prepared to pay a premium for maintaining their security, the average home IT user or small companies are left to maintain their own IT security. Furthermore, the decreasing cost of IT and the increasing use of technology in domestic life are exacerbating this problem.These limitations are heightened by the following factors. (i) The complexity of computational infrastructure-the cyber-physical platform on which security is attained-is increasing, and the technological landscape i...

show abstract

Systematic construction of anomaly detection benchmarks from real data

Cited by 111 publications

References 32 publications

A Collection of Benchmark Datasets for Systematic Evaluations of Machine Learning on the Semantic Web

A Collection of Benchmark Datasets for Systematic Evaluations of Machine Learning on the Semantic Web

Estudo, avaliação e comparação de técnicas de detecção não supervisionada de outliers

GraphBAD: A general technique for anomaly detection in security information and event management

Contact Info

Product

Resources

About