2007
DOI: 10.1007/s00357-007-0003-0
|View full text |Cite
|
Sign up to set email alerts
|

Initializing K-means Batch Clustering: A Critical Evaluation of Several Techniques

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

1
173
0
1

Year Published

2010
2010
2024
2024

Publication Types

Select...
7
1

Relationship

1
7

Authors

Journals

citations
Cited by 246 publications
(175 citation statements)
references
References 22 publications
1
173
0
1
Order By: Relevance
“…The dendogram, Duda and Hart index (49), and Calinski-Harabasz pseudo F-statistics (50) suggested five-cluster solutions for nouns and verbs and a fourcluster solution for abstract concepts. We then computed final clusters using the k-means algorithm and Ward's five-cluster solution as a basis (51)(52)(53). We interpreted all clusters based on mean EPA ratings and their most central words (see SI Appendix, Tables S3-S5 for details).…”
Section: Resultsmentioning
confidence: 99%
“…The dendogram, Duda and Hart index (49), and Calinski-Harabasz pseudo F-statistics (50) suggested five-cluster solutions for nouns and verbs and a fourcluster solution for abstract concepts. We then computed final clusters using the k-means algorithm and Ward's five-cluster solution as a basis (51)(52)(53). We interpreted all clusters based on mean EPA ratings and their most central words (see SI Appendix, Tables S3-S5 for details).…”
Section: Resultsmentioning
confidence: 99%
“…This result, first of all, indicates that especially the minimization aspect of algorithmic performance is troublesome in case of more problematic data characteristics, whereas under such circumstances recovery performance still remains rather satisfactory. It suggests the presence of many local optima in the additive biclustering optimization problem, which may remind one of the somewhat similar (and much simpler) discrete optimization problem in the K-means case, for which the problem of local optima has been well documented (Hand and Krzanowski 2005;Steinley and Brusco 2007). In the case of the additive biclustering model, with its considerably larger optimization space, obviously, the local optima problem is even much more challenging.…”
Section: Performance Of F Ull Clustering Alsmentioning
confidence: 99%
“…Therefore, it seems reasonable to put it to empirical testing. A version of the method, with a pre-specified K and with no removal of singletons, has been tested by Steinley and Brusco (2007), leading to rather mediocre results in their experiments. Here we intend to test the original version of the iK-means as a device for identifying both the number K and initial centroids.…”
Section: 2choosing K With the Intelligent K-meansmentioning
confidence: 99%
“…The data for experimental comparisons can be taken from real-world applications or generated Milligan and Cooper (1985), Steinley and Brusco (2007), and over both by Chae et al 2006, Dudoit andFridland (2002), Feng and Hamerly (2005), Kuncheva and Vetrov (2005), Maulik and Bandyopadhyay (2000). In this paper, we consider generated data only, to allow us to control the parameters of the experiments.…”
Section: Choosing Parameters Of the Experiments In K-means Clusteringmentioning
confidence: 99%
See 1 more Smart Citation