Mark Ming-Tso Chiang scite author profile

The issue of determining "the right number of clusters" in K-Means has attracted considerable interest, especially in the recent years. Cluster overlap appears to be a factor most affecting the clustering results. This paper proposes an experimental setting for comparison of different approaches at data generated from Gaussian clusters with the controlled parameters of between-and within-cluster spread to model different cluster overlaps. The setting allows for evaluating the centroid recovery on par with conventional evaluation of the cluster recovery. The subjects of our interest are two versions of the "intelligent" K-Means method, ik-Means, that find the right number of clusters one-by-one extracting "anomalous patterns" from the data. We compare them with seven other methods, including Hartigan's rule, averaged Silhouette width and Gap statistic, under six different between-and within-cluster spreadshape conditions. There are several consistent patterns in the results of our experiments, such as that the right K is reproduced best by Hartigan's rule -but not clusters or their centroids. This leads us to propose an adjusted version of iK-Means, which performs well in the current experiment setting.

show abstract

Experiments for the Number of Clusters in K-Means

Chiang¹,

Mirkin²

View full text Add to dashboard Cite

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Mark Ming-Tso Chiang

Intelligent Choice of the Number of Clusters in K-Means Clustering: An Experimental Study with Different Cluster Spreads

Experiments for the Number of Clusters in K-Means

Contact Info

Product

Resources

About