The blind men and the elephant: on meeting the problem of multiple truths in data from clustering and pattern mining perspectives

Zimek, Arthur; Vreeken, Jilles

doi:10.1007/s10994-013-5334-y

Cited by 36 publications

(22 citation statements)

References 164 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Because density is dimensionality-biased, i.e., when estimated using distance-based density estimators, the densities of a data cloud tend to be lower in higher-dimensional spaces. Hence these methods suffer from density variation across subspaces with different dimensionalities: low thresholds detect high-dimensional clusters but have difficulty filtering out noise in low-dimensional subspaces; while high thresholds screen out noise well in low-dimensional subspaces but tend to overlook high-dimensional clusters (Zimek and Vreeken 2015). LC can possibly be an effective remedy for this issue in subspace clustering since LC is not dimensionality-biased.…”

Section: Discussionmentioning

confidence: 99%

Local contrast as an effective means to robust clustering against varying densities

et al. 2018

View full text Add to dashboard Cite

Most density-based clustering methods have difficulties detecting clusters of hugely different densities in a dataset. A recent density-based clustering CFSFDP appears to have mitigated the issue. However, through formalising the condition under which it fails, we reveal that CFSFDP still has the same issue. To address this issue, we propose a new measure called Local Contrast, as an alternative to density, to find cluster centers and detect clusters. We then apply Local Contrast to CFSFDP, and create a new clustering method called LC-CFSFDP which is robust in the presence of varying densities. Our empirical evaluation shows that LC-CFSFDP outperforms CFSFDP and three other state-of-the-art variants of CFSFDP.

show abstract

Section: Discussionmentioning

confidence: 99%

Local contrast as an effective means to robust clustering against varying densities

et al. 2018

View full text Add to dashboard Cite

show abstract

“…On the other hand, it is also consensus to avoid an abundance of redundant results [35,55]. The eminent question is then, how many solutions to provide and how representative these solutions are.…”

Section: Representative Clusteringsmentioning

confidence: 99%

“…In this section, we show how good representatives having a high confidence and low τ can be extracted automatically from a set of sampled worlds. Furthermore, when more than a single representative world is returned, a requirement is to minimize redundancy between sets of worlds represented by each representative [12,29,55]. This requirement is important in order to avoid overly similar clustering representatives.…”

Section: Selection Of Representative Worldsmentioning

confidence: 99%

Representative clustering of uncertain data

Züfle

Emrich

Schmid

et al. 2014

Proceedings of the 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining

Self Cite

View full text Add to dashboard Cite

This paper targets the problem of computing meaningful clusterings from uncertain data sets. Existing methods for clustering uncertain data compute a single clustering without any indication of its quality and reliability; thus, decisions based on their results are questionable. In this paper, we describe a framework, based on possible-worlds semantics; when applied on an uncertain dataset, it computes a set of representative clusterings, each of which has a probabilistic guarantee not to exceed some maximum distance to the ground truth clustering, i.e., the clustering of the actual (but unknown) data. Our framework can be combined with any existing clustering algorithm and it is the first to provide quality guarantees about its result. In addition, our experimental evaluation shows that our representative clusterings have a much smaller deviation from the ground truth clustering than existing approaches, thus reducing the effect of uncertainty.

show abstract

“…In fact, the subspace outlier problem is a hard problem in its own right and the typical conference paper cannot accommodate a broader discussion for reasons of space restrictions. Furthermore, the subspace outlier problem could be seen as a problem analogous to the multiview or alternative clustering problem [77] where it is not intended to find the consensus clustering; instead, different clustering solutions in different subspaces can each be interesting, valid solutions. Likewise, different outliers in different subspaces could each be meaningfully reported.…”

Section: Introductionmentioning

confidence: 99%

Ensembles for unsupervised outlier detection

Zimek

Campello

Sander

2014

SIGKDD Explor. Newsl.

Self Cite

213

View full text Add to dashboard Cite

Ensembles for unsupervised outlier detection is an emerging topic that has been neglected for a surprisingly long time (although there are reasons why this is more difficult than supervised ensembles or even clustering ensembles). Aggarwal recently discussed algorithmic patterns of outlier detection ensembles, identified traces of the idea in the literature, and remarked on potential as well as unlikely avenues for future transfer of concepts from supervised ensembles. Complementary to his points, here we focus on the core ingredients for building an outlier ensemble, discuss the first steps taken in the literature, and identify challenges for future research.

show abstract

The blind men and the elephant: on meeting the problem of multiple truths in data from clustering and pattern mining perspectives

Cited by 36 publications

References 164 publications

Local contrast as an effective means to robust clustering against varying densities

Local contrast as an effective means to robust clustering against varying densities

Representative clustering of uncertain data

Ensembles for unsupervised outlier detection

Contact Info

Product

Resources

About