Initializing K-means Batch Clustering: A Critical Evaluation of Several Techniques

Steinley, Douglas; Brusco, Michael J.

doi:10.1007/s00357-007-0003-0

Cited by 246 publications

(175 citation statements)

References 22 publications

Supporting

Mentioning

173

Contrasting

Unclassified

Order By: Relevance

“…The dendogram, Duda and Hart index (49), and Calinski-Harabasz pseudo F-statistics (50) suggested five-cluster solutions for nouns and verbs and a fourcluster solution for abstract concepts. We then computed final clusters using the k-means algorithm and Ward's five-cluster solution as a basis (51)(52)(53). We interpreted all clusters based on mean EPA ratings and their most central words (see SI Appendix, Tables S3-S5 for details).…”

Section: Resultsmentioning

confidence: 99%

Consensus and stratification in the affective meaning of human sociality

Ambrasat

Scheve

Conrad

et al. 2014

Proc. Natl. Acad. Sci. U.S.A.

View full text Add to dashboard Cite

We investigate intrasocietal consensus and variation in affective meanings of concepts related to authority and community, two elementary forms of human sociality. Survey participants (n = 2,849) from different socioeconomic status (SES) groups in German society provided ratings of 909 social concepts along three basic dimensions of affective meaning. Results show widespread consensus on these meanings within society and demonstrate that a meaningful structure of socially shared knowledge emerges from organizing concepts according to their affective similarity. The consensus finding is further qualified by evidence for subtle systematic variation along SES differences. In relation to affectively neutral words, high-status individuals evaluate intimacyrelated and socially desirable concepts as less positive and powerful than middle-or low-status individuals, while perceiving antisocial concepts as relatively more threatening. This systematic variation across SES groups suggests that the affective meaning of sociality is to some degree a function of social stratification.cultural concensus | affect control theory | large-scale survey | cluster analysis | mixed-effects models

show abstract

Section: Resultsmentioning

confidence: 99%

Consensus and stratification in the affective meaning of human sociality

Ambrasat

Scheve

Conrad

et al. 2014

Proc. Natl. Acad. Sci. U.S.A.

View full text Add to dashboard Cite

show abstract

“…This result, first of all, indicates that especially the minimization aspect of algorithmic performance is troublesome in case of more problematic data characteristics, whereas under such circumstances recovery performance still remains rather satisfactory. It suggests the presence of many local optima in the additive biclustering optimization problem, which may remind one of the somewhat similar (and much simpler) discrete optimization problem in the K-means case, for which the problem of local optima has been well documented (Hand and Krzanowski 2005;Steinley and Brusco 2007). In the case of the additive biclustering model, with its considerably larger optimization space, obviously, the local optima problem is even much more challenging.…”

Section: Performance Of F Ull Clustering Alsmentioning

confidence: 99%

Additive Biclustering: A Comparison of One New and Two Existing ALS Algorithms

Wilderjans

Depril²,

Mechelen

2013

J Classif

View full text Add to dashboard Cite

Abstract:The additive biclustering model for two-way two-mode object by variable data implies overlapping clusterings of both the objects and the variables together with a weight for each bicluster (i.e., a pair of an object and a variable cluster). In the data analysis, an additive biclustering model is fitted to given data by means of minimizing a least squares loss function. To this end, two alternating least squares algorithms (ALS) may be used: (1) P ENCLUS, and (2) Baier's ALS approach. However, both algorithms suffer from some inherent limitations, which may hamper their performance. As a way out, based on theoretical results regarding optimally designing ALS algorithms, in this paper a new ALS algorithm will be presented. In a simulation study this algorithm will be shown to outperform the existing ALS approaches.

show abstract

“…Therefore, it seems reasonable to put it to empirical testing. A version of the method, with a pre-specified K and with no removal of singletons, has been tested by Steinley and Brusco (2007), leading to rather mediocre results in their experiments. Here we intend to test the original version of the iK-means as a device for identifying both the number K and initial centroids.…”

Section: 2choosing K With the Intelligent K-meansmentioning

confidence: 99%

“…The data for experimental comparisons can be taken from real-world applications or generated Milligan and Cooper (1985), Steinley and Brusco (2007), and over both by Chae et al 2006, Dudoit andFridland (2002), Feng and Hamerly (2005), Kuncheva and Vetrov (2005), Maulik and Bandyopadhyay (2000). In this paper, we consider generated data only, to allow us to control the parameters of the experiments.…”

Section: Choosing Parameters Of the Experiments In K-means Clusteringmentioning

confidence: 99%

“…First of all, the quantitative parameters of the generated data and cluster structure are specified: the number of entities N, the number of generated clusters K*, and the number of variables M. In most publications, these are kept relatively small: N ranges from about 50 to 200, M is in many cases 2 and, anyway, not greater than 10, and K* is of the order of 3, 4 or 5 (see, for example, Casillas et al 2003, Chae et al 2006, Hand and Krzhanowski 2005, Hardy 1996, Kuncheva and Petrov 2005, McLachlan and Khan 2004, Milligan and Cooper 1985. Larger sizes appear in Feng and Hamerly (2006) (N= 4000, M is up to 16 and K*=20) and Steinley and Brusco (2007) (N is up to 5000, M=25, 50 and 125, and K* =5, 10, 20). Our choice of these parameters is based on the idea that the data should imitate the conditions of real-world data analysis, under the timing constraints of the computational capacity.…”

Section: Data and Cluster Structure Parametersmentioning

confidence: 99%

See 1 more Smart Citation

Intelligent Choice of the Number of Clusters in K-Means Clustering: An Experimental Study with Different Cluster Spreads

2010

View full text Add to dashboard Cite

The issue of determining "the right number of clusters" in K-Means has attracted considerable interest, especially in the recent years. Cluster overlap appears to be a factor most affecting the clustering results. This paper proposes an experimental setting for comparison of different approaches at data generated from Gaussian clusters with the controlled parameters of between-and within-cluster spread to model different cluster overlaps. The setting allows for evaluating the centroid recovery on par with conventional evaluation of the cluster recovery. The subjects of our interest are two versions of the "intelligent" K-Means method, ik-Means, that find the right number of clusters one-by-one extracting "anomalous patterns" from the data. We compare them with seven other methods, including Hartigan's rule, averaged Silhouette width and Gap statistic, under six different between-and within-cluster spreadshape conditions. There are several consistent patterns in the results of our experiments, such as that the right K is reproduced best by Hartigan's rule -but not clusters or their centroids. This leads us to propose an adjusted version of iK-Means, which performs well in the current experiment setting.

show abstract

Initializing K-means Batch Clustering: A Critical Evaluation of Several Techniques

Cited by 246 publications

References 22 publications

Consensus and stratification in the affective meaning of human sociality

Consensus and stratification in the affective meaning of human sociality

Additive Biclustering: A Comparison of One New and Two Existing ALS Algorithms

Intelligent Choice of the Number of Clusters in K-Means Clustering: An Experimental Study with Different Cluster Spreads

Contact Info

Product

Resources

About