2019
DOI: 10.1002/widm.1330
|View full text |Cite
|
Sign up to set email alerts
|

Method evaluation, parameterization, and result validation in unsupervised data mining: A critical survey

Abstract: Machine Learning (ML) and Data Mining (DM) build tools intended to help users solve data‐related problems that are infeasible for “unaugmented” humans. Tools need manuals, however, and in the case of ML/DM methods, this means guidance with respect to which technique to choose, how to parameterize it, and how to interpret derived results to arrive at knowledge about the phenomena underlying the data. While such information is available in the literature, it has not yet been collected in one place. We survey thr… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
4

Citation Types

0
15
0

Year Published

2021
2021
2023
2023

Publication Types

Select...
5
2

Relationship

0
7

Authors

Journals

citations
Cited by 14 publications
(15 citation statements)
references
References 175 publications
(213 reference statements)
0
15
0
Order By: Relevance
“…There is some work on the so-called benchmarking of clustering methods (Van Mechelen et al, 2018;Zimmermann, 2020). This is different from our approach.…”
Section: Introductionmentioning
confidence: 95%
“…There is some work on the so-called benchmarking of clustering methods (Van Mechelen et al, 2018;Zimmermann, 2020). This is different from our approach.…”
Section: Introductionmentioning
confidence: 95%
“…However, when conducting cluster analysis, researchers are confronted with an overwhelming number of existing methods. They must preprocess the data, choose a clustering algorithm, and set parameters, such as the number of clusters (Van Mechelen et al, 2018; Zimmermann, 2020). It is often unclear a priori which choice should be made for the analysis, and even once a choice is made, it may remain unclear how good the quality of the resulting clustering is.…”
Section: Introductionmentioning
confidence: 99%
“…The phrase “cluster validation” also appears in the literature about benchmarking of clustering methods (Boulesteix & Hatz, 2017; Van Mechelen et al, 2018; Zimmermann, 2020). A benchmarking study is a systematic comparison of different clustering methods on a class of data distributions or datasets.…”
Section: Introductionmentioning
confidence: 99%
See 1 more Smart Citation
“…Regarding the appropriate design and analysis of benchmark studies, the available literature ranges from general guidelines (Weber et al, 2019;Boulesteix, 2015) and statistical frameworks (Demšar, 2006;Hothorn et al, 2005;Eugster et al, 2012;Boulesteix et al, 2015, all with focus on supervised learning), to recommendations for context-specific benchmarks (e.g. Mangul et al, 2019;Bokulich et al, 2020;Zimmermann, 2020;Kreutz, 2019). However, for many issues relevant in practice (e.g.…”
Section: Introduction and Related Workmentioning
confidence: 99%