An empirical comparison between stochastic and deterministic centroid initialisation for K-Means variations

Vouros, Avgoustinos; Langdell, Stephen; Croucher, Mike; Vasilaki, Eleni

doi:10.48550/arxiv.1908.09946

Cited by 2 publications

(7 citation statements)

References 0 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…We have also showed that, similar to unsupervised methods [15], semi-supervised algorithms can be affected by initialisation procedures and by the type of used constraints. From our results we observed that using only MUST-LINK constraints has a negative effect on the semi-supervised algorithms (however PCSKM could cope with the MUST-LINK constraints much better than MPCKM and PCKM).…”

Section: Discussionmentioning

confidence: 83%

“…Our benchmark includes the real world data sets fisheriris and ionosphere from the UCI repository [18] which have unknown feature quality. We also added two synthetic data sets that were generated based on the generator of [17] which we used in our previous study [15]. These synthetic data sets are consisting of informative and uninformative features without any noise injection.…”

Section: Resultsmentioning

confidence: 99%

“…In our first experiment we wanted to assess the performance of PCSKM compared with other unsupervised algorithms (LKM and SKM) and semi-supervised algorithms (PCKM and MPCKM). We tested the performance of all these algorithms using the deterministic initialisation technique of DKM++ [23] which on average performed best based in our previous benchmark [15], but does not take into account constraints. We also tested the semi-supervised algorithms with different number and types of constraints including, only MUST-LINK, only CANNOT-LINK, and random selection from both MUST-LINK and CANNOT-LINK.…”

Section: Resultsmentioning

confidence: 99%

“…We name this algorithm Pairwise Constrained Sparse K-Means (PCSKM) and we testing its performance under different conditions such as different number and kind of constraints (CANNOT-LINK, MUST-LINK or both). In our previous study [15] we have shown that the deterministic initialisation method of Density K-Means++ (DKM++) on average performs better than stochastic methods thus we select this method for the initialisation of the algorithms along with the seeding method proposed in the study of [16]. In our benchmark we include synthetic data sets from the study of [17] with known feature quality and real world data sets from the UCI [18] and a real world data set from our previous behavioural neuroscience study [19] which contains ten known uninformative features.…”

Section: Introductionmentioning

confidence: 89%

See 3 more Smart Citations

A semi-supervised sparse K-Means algorithm

Vouros¹,

Vasilaki²

2020

Preprint

Self Cite

View full text Add to dashboard Cite

We consider the problem of data clustering with unidentified feature quality but with the existence of small amount of labelled data. In the first case a sparse clustering method can be employed in order to detect the subgroup of features necessary for clustering and in the second case a semisupervised method can use the labelled data to create constraints and enhance the clustering solution.In this paper we propose a K-Means inspired algorithm that employs these techniques. We show that the algorithm maintains the high performance of other similar semi-supervised algorthms as well as keeping the ability to identify informative from uninformative features. We examine the performance of the algorithm on synthetic and real world data sets. We use a series of scenarios with different number and types of constraints as well as two different clustering initialisation methods.

show abstract

Section: Discussionmentioning

confidence: 83%

Section: Resultsmentioning

confidence: 99%

Section: Resultsmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 89%

See 2 more Smart Citations

A semi-supervised sparse K-Means algorithm

Vouros¹,

Vasilaki²

2020

Preprint

Self Cite

View full text Add to dashboard Cite

show abstract

“…We name this algorithm Pairwise Constrained Sparse K-Means (PCSKM) and we test its performance under different conditions such as different number and kind of constraints (CANNOT-LINK, MUST-LINK or both). In our previous study [14] we have shown that the deterministic initialisation method of Density K-Means++ (DKM++) [15] surpasses the average performance of stochastic methods thus we select this method for the initialisation of the algorithms along with the Seeding method proposed in the study of [16]. We have also included the initialisation methods of ROBIN [17] and Maximin [18] to strengthen our conclusions (results in appendix).…”

Section: Introductionmentioning

confidence: 95%

A semi-supervised sparse K-Means algorithm

Vouros

Vasilaki

2021

Pattern Recognition Letters

View full text Add to dashboard Cite

An empirical comparison between stochastic and deterministic centroid initialisation for K-Means variations

Cited by 2 publications

References 0 publications

A semi-supervised sparse K-Means algorithm

A semi-supervised sparse K-Means algorithm

A semi-supervised sparse K-Means algorithm

Contact Info

Product

Resources

About