2013
DOI: 10.1007/s10994-013-5334-y
|View full text |Cite
|
Sign up to set email alerts
|

The blind men and the elephant: on meeting the problem of multiple truths in data from clustering and pattern mining perspectives

Abstract: In this position paper, we discuss how different branches of research on clustering and pattern mining, while rather different at first glance, in fact have a lot in common and can learn a lot from each other's solutions and approaches. We give brief introductions to the fundamental problems of different sub-fields of clustering, especially focusing on subspace clustering, ensemble clustering, alternative (as a variant of constraint) clustering, and multiview clustering (as a variant of alternative clustering)… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

0
20
0

Year Published

2013
2013
2023
2023

Publication Types

Select...
4
4
1

Relationship

4
5

Authors

Journals

citations
Cited by 36 publications
(22 citation statements)
references
References 164 publications
0
20
0
Order By: Relevance
“…Because density is dimensionality-biased, i.e., when estimated using distance-based density estimators, the densities of a data cloud tend to be lower in higher-dimensional spaces. Hence these methods suffer from density variation across subspaces with different dimensionalities: low thresholds detect high-dimensional clusters but have difficulty filtering out noise in low-dimensional subspaces; while high thresholds screen out noise well in low-dimensional subspaces but tend to overlook high-dimensional clusters (Zimek and Vreeken 2015). LC can possibly be an effective remedy for this issue in subspace clustering since LC is not dimensionality-biased.…”
Section: Discussionmentioning
confidence: 99%
“…Because density is dimensionality-biased, i.e., when estimated using distance-based density estimators, the densities of a data cloud tend to be lower in higher-dimensional spaces. Hence these methods suffer from density variation across subspaces with different dimensionalities: low thresholds detect high-dimensional clusters but have difficulty filtering out noise in low-dimensional subspaces; while high thresholds screen out noise well in low-dimensional subspaces but tend to overlook high-dimensional clusters (Zimek and Vreeken 2015). LC can possibly be an effective remedy for this issue in subspace clustering since LC is not dimensionality-biased.…”
Section: Discussionmentioning
confidence: 99%
“…On the other hand, it is also consensus to avoid an abundance of redundant results [35,55]. The eminent question is then, how many solutions to provide and how representative these solutions are.…”
Section: Representative Clusteringsmentioning
confidence: 99%
“…In this section, we show how good representatives having a high confidence and low τ can be extracted automatically from a set of sampled worlds. Furthermore, when more than a single representative world is returned, a requirement is to minimize redundancy between sets of worlds represented by each representative [12,29,55]. This requirement is important in order to avoid overly similar clustering representatives.…”
Section: Selection Of Representative Worldsmentioning
confidence: 99%
“…In fact, the subspace outlier problem is a hard problem in its own right and the typical conference paper cannot accommodate a broader discussion for reasons of space restrictions. Furthermore, the subspace outlier problem could be seen as a problem analogous to the multiview or alternative clustering problem [77] where it is not intended to find the consensus clustering; instead, different clustering solutions in different subspaces can each be interesting, valid solutions. Likewise, different outliers in different subspaces could each be meaningfully reported.…”
Section: Introductionmentioning
confidence: 99%