2007
DOI: 10.1111/j.1541-0420.2007.00784.x
|View full text |Cite
|
Sign up to set email alerts
|

Determining the Number of Clusters Using the Weighted Gap Statistic

Abstract: Estimating the number of clusters in a data set is a crucial step in cluster analysis. In this article, motivated by the gap method (Tibshirani, Walther, and Hastie, 2001, Journal of the Royal Statistical Society B63, 411-423), we propose the weighted gap and the difference of difference-weighted (DD-weighted) gap methods for estimating the number of clusters in data using the weighted within-clusters sum of errors: a measure of the within-clusters homogeneity. In addition, we propose a "multilayer" clustering… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

1
103
0
3

Year Published

2010
2010
2022
2022

Publication Types

Select...
7
1

Relationship

0
8

Authors

Journals

citations
Cited by 133 publications
(109 citation statements)
references
References 9 publications
1
103
0
3
Order By: Relevance
“…The optimal number of clusters (i.e., seasons) was determined using the difference of difference-weighted (DD-weighted) gap method (Yan and Ye 2007), which is based on the gap statistic (Tibshirani et al 2001). The gap statistic is defined as:…”
Section: Data Collectionmentioning
confidence: 99%
“…The optimal number of clusters (i.e., seasons) was determined using the difference of difference-weighted (DD-weighted) gap method (Yan and Ye 2007), which is based on the gap statistic (Tibshirani et al 2001). The gap statistic is defined as:…”
Section: Data Collectionmentioning
confidence: 99%
“…To identify subgroups of ALL cases defined by methylation signatures, we performed unsupervised hierarchical clustering of high variance (SD >1 across the cohort; n = 4,037 probe sets), followed by determination of association between cluster membership and cytogenetic subtype by calculation of the gap statistic (24,25) and Rand index (26). This procedure identified 8 robust clusters of leukemic samples using both indices.…”
Section: Dna Methylation Profiles In B-all Are Associated With Genetimentioning
confidence: 99%
“…Initial analysis was performed on probe sets with across-patient SD >1 (n = 4,037) in order to filter out uninformative probe sets. Gap statistic analysis (24,25) was used in order to identify the optimal cutoff of the tree, and the adjusted Rand index (26) was used to compare the extent of concurrence between the clustering results and the underlying cytogenetic subtypes. To further determine the stability of these clustering results, the hierarchical clustering was repeated at different SD cutoffs, and the gap statistic and Rand index was determined for each.…”
Section: Figurementioning
confidence: 99%
See 1 more Smart Citation
“…Determining the number of clusters in a data set is a fundamental problem in cluster analysis [9], [38], [39], [40], [41], [42], [43], [44], [45]. Although this problem is still largely unresolved, numerous methods have been suggested for it.…”
Section: Some Related Methods For Cluster Number Detectionmentioning
confidence: 99%