Cluster analysis is one of the popular data mining techniques and it is defined as the process of grouping similar data. K-Means is one of the clustering algorithms to cluster the numerical data. The features of K-Means clustering algorithm are easy to implement and it is efficient to handle large amounts of data. The major problem with K-Means is the selection of initial centroids. It selects the initial centroids randomly and it leads to a local optimum solution. Recently, nature-inspired optimization algorithms are combined with clustering algorithms to obtain the global optimum solution. Crow Search Algorithm (CSA) is a new populationbased metaheuristic optimization algorithm. This algorithm is based on the intelligent behaviour of the crows. In this paper, CSA is combined with the K-Means clustering algorithm to obtain the global optimum solution. Experiments are conducted on benchmark datasets and the results are compared to those from various clustering algorithms and optimization-based clustering algorithms. Also the results are evaluated with internal, external and statistical experiments to prove the efficiency of the proposed algorithm.
Abstract-Due to explosion in the number of autonomous data sources, there is an emergent need for effective approaches to distributed clustering. Intuitionistic Fuzzy Set is a suitable tool to cope with imperfectly defined facts and data, as well as with imprecise knowledge. This paper introduces a novel intuitionistic fuzzy based distributed clustering algorithm, to cluster distributed datasets, without necessarily downloading all the data into a single site. The process is carried out in two different levels: local level and global level. In local level, numerical datasets are converted into intuitionistic fuzzy data and they are clustered independently from each other using modified fuzzy C-Means algorithm. In global level, global centroid is computed by clustering all local cluster centroids. The global centroid is again transmitted to local sites to update the local cluster model. The new algorithm is compared against two existing ensemble based distributed clustering algorithms and centralized clustering where all the data are merged into a single data source and clustered. The simulated experiments described in this paper confirm good performance of the proposed algorithm.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.