An Entropy Regularization k-Means Algorithm with a New Measure of between-Cluster Distance in Subspace Clustering

Xiong, Liansong; Wang, Cheng; Huang, Xiaohui; Zeng, Hui

doi:10.3390/e21070683

Cited by 8 publications

(8 citation statements)

References 32 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…We have also compared our metrics to some related work. We have used as a comparison the following results from [36]: KMEA, WKME, EWKM, ESSC, AFKM, SC, SSC-MP, ERKM; and from [29]: Bayes Network Classifier, J48, Random Forest, OneR. In Table 2 we can see that the F1-Score for the two ABARC cases is better than all of the others, but the Kappa Score is better only after removing hybrids.…”

Section: Metricsmentioning

confidence: 99%

See 1 more Smart Citation

A Comprehensive Evaluation of Rough Sets Clustering in Uncertainty Driven Contexts

Szederjesi-Dragomir

2024

Studia UBB Informatica

View full text Add to dashboard Cite

This paper presents a comprehensive evaluation of the Agent BAsed Rough sets Clustering (ABARC) algorithm, an approach using rough sets theory for clustering in environments characterized by uncertainty. Several experiments utilizing standard datasets are performed in order to compare ABARC against a range of supervised and unsupervised learning algorithms. This comparison considers various internal and external performance measures to evaluate the quality of clustering. The results highlight the ABARC algorithm’s capability to effectively manage vague data and outliers, showcasing its advantage in handling uncertainty in data. Furthermore, they also emphasize the importance of choosing appropriate performance metrics, especially when evaluating clustering algorithms in scenarios with unclear or inconsistent data. Keywords: rough sets, clustering, metrics.

show abstract

Section: Metricsmentioning

confidence: 99%

“…Precision Recall F1-Score Kappa Score KMEA [23] 81.2 WKME [13] 79.8 EWKM [15] 82.6 ESSC [6] 84.8 AFKM [1] 81.6 SC [31] 47.2 SSC-MP [32] 76.7 ERKM [36] 90. Seeds dataset.…”

Section: Algorithmmentioning

confidence: 99%

A Comprehensive Evaluation of Rough Sets Clustering in Uncertainty Driven Contexts

Szederjesi-Dragomir

2024

Studia UBB Informatica

View full text Add to dashboard Cite

show abstract

“…r distances, KICIC), which is able to integrate intra-cluster solve the clustering problem of high-dimensional data [8]. rithm based on distance threshold and weighted sample.…”

Section: Traditional K-means Algorithmmentioning

confidence: 99%

“…In each iteration of clustering, it computes the optimal weight of attributes according to the change of centroid vector which minimizes the sum of distance between each instance and the centroid [28]. Xiaohui proposed a novel K-Means type method (a weighting K-Means clustering approach by integrating intra-cluster and inter-cluster distances, KICIC), which is able to integrate intra-cluster compactness and inter-cluster separation to solve the clustering problem of high-dimensional data [8]. Jiyong proposed a K-Means clustering algorithm based on distance threshold and weighted sample.…”

Section: Introductionmentioning

confidence: 99%

Parallel Implementation of Improved K-Means Based on a Cloud Platform

Zhang¹,

Liu

Chen³

et al. 2019

ITC

View full text Add to dashboard Cite

In order to solve the problem of traditional K-Means clustering algorithm in dealing with large-scale data set, a Hadoop K-Means (referred to HKM) clustering algorithm is proposed. Firstly, according to the sample density, the algorithm eliminates the effects of noise points in the data set. Secondly, it optimizes the selection of the initial center point using the thought of the max-min distance. Finally, it uses a MapReduce programming model to realize the parallelization. Experimental results show that the proposed algorithm not only has high accuracy and stability in clustering results, but can also solve the problems of scalability encountered by traditional clustering algorithms in dealing with large scale data.

show abstract

“…Another approach for feature weighting in centre‐based clustering is obtained by incorporating a Shannon entropic regularizer on feature weights (Jing, Ng, & Huang, 2007). Some relevant works on the endorsement of using an entropy regularizer include Chakraborty, Paul, Das, and Xu (2020), Pihur, Datta, and Datta (2007), Xiong, Wang, Huang, and Zeng (2019), and Zhou, Chen, Chen, Zhang, and Li (2016). A comprehensive overview of the available clustering techniques for high‐dimensional data in the form of R package ‘HDclassif’ can be found in Bergé, Bouveyron, and Girard (2012).…”

Section: Introductionmentioning

confidence: 99%

A Bayesian non‐parametric approach for automatic clustering with feature weighting

Paul

Das

2020

Stat

View full text Add to dashboard Cite

Despite being a well‐known problem, feature weighting and feature selection are a major predicament for clustering. Most of the algorithms, which provide weighting or selection of features, require the number of clusters to be known in advance. On the other hand, the existing automatic clustering procedures that can determine the number of clusters are computationally expensive and often do not make a room for feature weighting or selection. In this paper, we propose a Gibbs sampling‐based algorithm for the Dirichlet process mixture model, which can determine the number of clusters and can also incorporate a near‐optimal feature weighting. We show that in the limiting case, the algorithm approaches a hard clustering procedure, which resembles minimization of an underlying clustering objective similar to weighted k‐means with an additional forfeit for the number of clusters and hence retains the simplicity of the Llyod's heuristics. To avoid the trivial solution of the resulting linear program, we include an additional entropic penalty on the feature weights. The proposed algorithm is tested on several synthetic and real‐life datasets. Through a detailed experimental analysis, we demonstrate the competitiveness of our proposal against the baseline as well as state‐of‐the‐art procedures for centre‐based high‐dimensional clustering.

show abstract

An Entropy Regularization k-Means Algorithm with a New Measure of between-Cluster Distance in Subspace Clustering

Cited by 8 publications

References 32 publications

A Comprehensive Evaluation of Rough Sets Clustering in Uncertainty Driven Contexts

A Comprehensive Evaluation of Rough Sets Clustering in Uncertainty Driven Contexts

Parallel Implementation of Improved K-Means Based on a Cloud Platform

A Bayesian non‐parametric approach for automatic clustering with feature weighting

Contact Info

Product

Resources

About