2021
DOI: 10.1007/s10994-021-06021-7
|View full text |Cite
|
Sign up to set email alerts
|

An empirical comparison between stochastic and deterministic centroid initialisation for K-means variations

Abstract: K-Means is one of the most used algorithms for data clustering and the usual clustering method for benchmarking. Despite its wide application it is well-known that it suffers from a series of disadvantages; it is only able to find local minima and the positions of the initial clustering centres (centroids) can greatly affect the clustering solution. Over the years many K-Means variations and initialisation techniques have been proposed with different degrees of complexity. In this study we focus on common K-Me… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
16
0

Year Published

2022
2022
2024
2024

Publication Types

Select...
5
2

Relationship

1
6

Authors

Journals

citations
Cited by 25 publications
(16 citation statements)
references
References 32 publications
0
16
0
Order By: Relevance
“…We applied K-Means clustering on these transformed data and the original dataset, varying the number of clusters from 1 to 10. As a clustering algorithm, we used Lloyd’s K-Means 37 initialised with the Density K-Means++ method 38 which worked well in our previous benchmark 39 . We used the R package clustree 40 to visualize a K-Means clustering tree (number of target clusters 1 to 10) for data transformations with different numbers of PCs (2 to 11, leading to different data dimensionality, see also Supplementary Material ).…”
Section: Methodsmentioning
confidence: 99%
“…We applied K-Means clustering on these transformed data and the original dataset, varying the number of clusters from 1 to 10. As a clustering algorithm, we used Lloyd’s K-Means 37 initialised with the Density K-Means++ method 38 which worked well in our previous benchmark 39 . We used the R package clustree 40 to visualize a K-Means clustering tree (number of target clusters 1 to 10) for data transformations with different numbers of PCs (2 to 11, leading to different data dimensionality, see also Supplementary Material ).…”
Section: Methodsmentioning
confidence: 99%
“…SSE, Silhouette score, Purity and CPU time are used to measure the performance of our proposed method. While SSE, Purity and Silhouette score measure the quality of the clusters formed, CPU Time measures efficiency (7,(13)(14)(15) . These evaluation criteria are explained below:…”
Section: Evaluation Criteriamentioning
confidence: 99%
“…2. Purity: It is an external validity index that measures the degree of similarity between the clustering solution formed by a clustering method and that specified by the given class labels (7).…”
Section: Sum Of Squared Error (Sse)mentioning
confidence: 99%
See 2 more Smart Citations