K-means clustering algorithm is a partitional clustering algorithm that has been used widely in many applications for traditional clustering due to its simplicity and low computational complexity. This clustering technique depends on the user specification of the number of clusters generated from the dataset, which affects the clustering results. Moreover, random initialization of cluster centers results in its local minimal convergence. Automatic clustering is a recent approach to clustering where the specification of cluster number is not required. In automatic clustering, natural clusters existing in datasets are identified without any background information of the data objects. Nature-inspired metaheuristic optimization algorithms have been deployed in recent times to overcome the challenges of the traditional clustering algorithm in handling automatic data clustering. Some nature-inspired metaheuristics algorithms have been hybridized with the traditional K-means algorithm to boost its performance and capability to handle automatic data clustering problems. This study aims to identify, retrieve, summarize, and analyze recently proposed studies related to the improvements of the K-means clustering algorithm with nature-inspired optimization techniques. A quest approach for article selection was adopted, which led to the identification and selection of 147 related studies from different reputable academic avenues and databases. More so, the analysis revealed that although the K-means algorithm has been well researched in the literature, its superiority over several well-established state-of-the-art clustering algorithms in terms of speed, accessibility, simplicity of use, and applicability to solve clustering problems with unlabeled and nonlinearly separable datasets has been clearly observed in the study. The current study also evaluated and discussed some of the well-known weaknesses of the K-means clustering algorithm, for which the existing improvement methods were conceptualized. It is noteworthy to mention that the current systematic review and analysis of existing literature on K-means enhancement approaches presents possible perspectives in the clustering analysis research domain and serves as a comprehensive source of information regarding the K-means algorithm and its variants for the research community.
Kmeans clustering algorithm is an iterative unsupervised learning algorithm that tries to partition the given dataset into k pre-defined distinct non-overlapping clusters where each data point belongs to only one group. However, its performance is affected by its sensitivity to the initial cluster centroids with the possibility of convergence into local optimum and specification of cluster number as the input parameter. Recently, the hybridization of metaheuristics algorithms with the K-Means algorithm has been explored to address these problems and effectively improve the algorithm’s performance. Nonetheless, most metaheuristics algorithms require rigorous parameter tunning to achieve an optimum result. This paper proposes a hybrid clustering method that combines the well-known symbiotic organisms search algorithm with K-Means using the SOS as a global search metaheuristic for generating the optimum initial cluster centroids for the K-Means. The SOS algorithm is more of a parameter-free metaheuristic with excellent search quality that only requires initialising a single control parameter. The performance of the proposed algorithm is investigated by comparing it with the classical SOS, classical K-means and other existing hybrids clustering algorithms on eleven (11) UCI Machine Learning Repository datasets and one artificial dataset. The results from the extensive computational experimentation show improved performance of the hybrid SOSK-Means for solving automatic clustering compared to the standard K-Means, symbiotic organisms search clustering methods and other hybrid clustering approaches.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.