2004
DOI: 10.1162/089976604773717621
|View full text |Cite
|
Sign up to set email alerts
|

Stability-Based Validation of Clustering Solutions

Abstract: Data clustering describes a set of frequently employed techniques in exploratory data analysis to extract "natural" group structure in data. Such groupings need to be validated to separate the signal in the data from spurious structure. In this context, finding an appropriate number of clusters is a particularly important model selection question. We introduce a measure of cluster stability to assess the validity of a cluster model. This stability measure quantifies the reproducibility of clustering solutions … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1

Citation Types

3
320
0
4

Year Published

2007
2007
2024
2024

Publication Types

Select...
3
3
3

Relationship

1
8

Authors

Journals

citations
Cited by 418 publications
(342 citation statements)
references
References 17 publications
3
320
0
4
Order By: Relevance
“…Examples are Ben-Hur et al (2002), Bryan (2004), Dudoit and Fridlyand (2002), Grün and Leisch (2004), Lange et al (2004), Monti et al (2001) and Tibshirani and Walther (2005). Many of these papers use stability or prediction strength measurements as a tool to estimate the true number of clusters.…”
Section: Introductionmentioning
confidence: 99%
“…Examples are Ben-Hur et al (2002), Bryan (2004), Dudoit and Fridlyand (2002), Grün and Leisch (2004), Lange et al (2004), Monti et al (2001) and Tibshirani and Walther (2005). Many of these papers use stability or prediction strength measurements as a tool to estimate the true number of clusters.…”
Section: Introductionmentioning
confidence: 99%
“…Existing algorithms include stability-based methods [5,6], model-fitting-based algorithms [7], and methods based on Clustering Validity Indices (CVI) [1]. A CVI is a measure derived from the obtained clustering solution, which quantifies such properties of a clustering solution as compactness, separation between clusters, etc.…”
Section: Introductionmentioning
confidence: 99%
“…We run the sampler with a number of clusters varying from 1 to 10 each for 10 different random initializations. We compare the transfer costs with the instability measure proposed in [15]. The results are summarized in Figure 5.…”
Section: Minimum Transfer Costs For Non-factorial Modelsmentioning
confidence: 99%