Initializing KMeans Clustering Algorithm using Statistical Information

Eltibi, Mohammad F.; Ashour, Wesam M.

doi:10.5120/3573-4930

Cited by 25 publications

(10 citation statements)

References 13 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Further, the greater the similarity within a group and the greater the difference between groups, the better or more distinct is the clustering [19]. The standard k-means clustering algorithm (SKMC) is one of the best-known and most popular algorithms used in clustering, and it seeks an optimal partition of the data by using different criteria [20]- [21]. However, the results obtained from the SKMC highly depend on the initialization of the clustering parameters; in other words, different initializations may produce different results.…”

Section: A Improved Bisecting K-means Clustering Algorithmmentioning

confidence: 99%

Clustering-interpolation Method and Its Application to Wind Turbine Generator Curve

Yang

Luo

et al. 2014

ElAEE

View full text Add to dashboard Cite

The real-time operating wind turbine power curve (WPC) of a wind turbine generator (WTG) is not completely identical to a WPC provided by the manufacturer because of various factors. In order to obtain an accurate WPC model that can consider various factors, this paper improves a bisecting k-means clustering algorithm. The improved clustering algorithm is used for partitioning the measured data into a certain number of groups, which can be expressed in their centroids. The interpolation method based on the polynomial is carried out for modelling a WPC of WTG. The modelled WPC is applied to the reliability analysis of the generating systems with a wind farm. The results show that the accuracy of the linear interpolation is higher than that of quadratic interpolation and cubic spline interpolation when there are a relatively large number of clusters.

show abstract

Section: A Improved Bisecting K-means Clustering Algorithmmentioning

confidence: 99%

Clustering-interpolation Method and Its Application to Wind Turbine Generator Curve

Yang

Luo

et al. 2014

ElAEE

View full text Add to dashboard Cite

show abstract

“…Some works have focused on finding the best value for the initial number of clusters k and the best way of choosing the initial centroids as described in [8], [9], [10], [11], [12], [13], [14], [15]. Other research works are focused on defining the best stopping criterion in order to avoid excessive iterations considering that K-Means converges at a local minimum [16].…”

Section: Related Workmentioning

confidence: 99%

Improvement to the K-Means algorithm through a heuristics based on a bee honeycomb structure

Pérez

Mexicano

Santaolaya

et al. 2012

2012 Fourth World Congress on Nature and Biologically Inspired Computing (NaBIC)

View full text Add to dashboard Cite

The object clustering problem, according to their similarity measures, can be formulated as a combinatorial optimization problem. The K-Means algorithm has been widely used for solving such problem; however, its computational cost is very high. In this work a new heuristics is proposed for reducing the computational complexity in the classification step of the algorithm based on a honeycomb structure, which the algorithm builds when clusters are visualized in a two-dimensional space. In particular it has been observed that an object can only change membership to neighboring clusters. The heuristics consists of performing distance calculations only with respect to centroids of neighboring clusters, which reduces the number of calculations. For assessing the performance of this heuristics, a set of experiments was carried out that involved 2 500, 10 000 and 40 000 objects uniformly distributed in a two-dimensional space, as well as real-world instances of 3 100 and 245 057 objects with 2 and 3 dimensions. The results were encouraging, since the calculation time was reduced 65% on average, with respect to the standard K-Means algorithm for the synthetic instance, and up to 62% on average for the real-world instances, while the quality was reduced on average by 0.05% and 2.5%, respectively.

show abstract

“…In [50] a method is proposed, which is based on a sample of the data set for which an average is calculated. Next, the objects whose distance is larger than the average are identified, and a distance-between-objects criterion is applied for selecting the objects that will constitute the initial objects.…”

Section: Introduction To Data Science and Machine Learningmentioning

confidence: 99%

TheK-Means Algorithm Evolution

Pérez-Ortega¹,

Almanza-Ortega²,

Vega-Villalobos³

et al. 2020

Introduction to Data Science and Machine Learning

View full text Add to dashboard Cite

Clustering is one of the main methods for getting insight on the underlying nature and structure of data. The purpose of clustering is organizing a set of data into clusters, such that the elements in each cluster are similar and different from those in other clusters. One of the most used clustering algorithms presently is K-means, because of its easiness for interpreting its results and implementation. The solution to the K-means clustering problem is NP-hard, which justifies the use of heuristic methods for its solution. To date, a large number of improvements to the algorithm have been proposed, of which the most relevant were selected using systematic review methodology. As a result, 1125 documents on improvements were retrieved, and 79 were left after applying inclusion and exclusion criteria. The improvements selected were classified and summarized according to the algorithm steps: initialization, classification, centroid calculation, and convergence. It is remarkable that some of the most successful algorithm variants were found. Some articles on trends in recent years were included, concerning K-means improvements and its use in other areas. Finally, it is considered that the main improvements may inspire the development of new heuristics for K-means or other clustering algorithms.

show abstract

Initializing KMeans Clustering Algorithm using Statistical Information

Cited by 25 publications

References 13 publications

Clustering-interpolation Method and Its Application to Wind Turbine Generator Curve

Clustering-interpolation Method and Its Application to Wind Turbine Generator Curve

Improvement to the K-Means algorithm through a heuristics based on a bee honeycomb structure

TheK-Means Algorithm Evolution

Contact Info

Product

Resources

About