Linear, Deterministic, and Order-Invariant Initialization Methods for the K-Means Clustering Algorithm

Çelebi, Mehmet; Kingravi, Hassan A.

doi:10.1007/978-3-319-09259-1_3

Cited by 42 publications

(48 citation statements)

References 96 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Standard clustering evaluation approaches that were tailored for partitional clustering methods [18] are not universal and not well suited for other types of tasks like the evaluation of nested clustering structures, for which they need to be extended and adapted [24].…”

Section: Clustering Quality Indexes: Existing Surveysmentioning

confidence: 99%

Clustering Evaluation in High-Dimensional Data

Tomašev

Radovanović

2016

Unsupervised Learning Algorithms

View full text Add to dashboard Cite

Clustering evaluation plays an important role in unsupervised learning systems, as it is often necessary to automatically quantify the quality of generated cluster configurations. This is especially useful for comparing the performance of different clustering algorithms as well as determining the optimal number of clusters in clustering algorithms that do not estimate it internally. Many clustering quality indexes have been proposed over the years and different indexes are used in different contexts. There is no unifying protocol for clustering evaluation, so it is often unclear which quality index to use in which case. In this chapter, we review the existing clustering quality measures and evaluate them in the challenging context of high-dimensional data clustering. High-dimensional data is sparse and distances tend to concentrate, possibly affecting the applicability of various clustering quality indexes. We analyze the stability and discriminative power of a set of standard clustering quality measures with increasing data dimensionality. Our evaluation shows that the curse of dimensionality affects different clustering quality indexes in different ways and that some are to be preferred when determining clustering quality in many dimensions.

show abstract

Section: Clustering Quality Indexes: Existing Surveysmentioning

confidence: 99%

Clustering Evaluation in High-Dimensional Data

Tomašev

Radovanović

2016

Unsupervised Learning Algorithms

View full text Add to dashboard Cite

show abstract

“…Calculates probability value of the fitness value by dividing total fitness value by fitness value b) Calculate the cumulative c) Specify value randomly. Determine randomly a range [1][2][3][4][5][6][7][8][9][10][11][12][13][14][15][16][17][18][19][20] and select the parent that will be the candidate toolbar and then select "1 Column" from the selection palette. d) Compare between random values and cumulative values The selection process is finished to get a new chromosome value and next for crossover process.…”

Section: A Proposed Methodsmentioning

confidence: 99%

“…It is also supported in the study of Celebi et. al [8] where a random initialization of the centroid process will cause K-means to be trapped in a minimum local point conditions. The local minimum is a situation where the right centroid is found only when the initial partition approaches the final solution.…”

Section: Introductionmentioning

confidence: 99%

Enhancement of K-Parameter Using Hybrid Statifies Sampling and Genetic Algorithm

Ramadhani¹,

Priyanto²,

Sidiq³

2018

INFOTEL

View full text Add to dashboard Cite

Clustering is a technique used to classify data into clusters based on their similarities. K-means is a clustering algorithm method that classifies the objects based on their closest distance to the cluster center to the groups that have most similarities among the members. In addition, K-means is also the most widely used clustering algorithm due to its ease of implementation. However, the pro cess of selecting the centroid on Kmeans still randomly. This results K-means is often trapped in local minimum conditions. Genetic algorithm is used in this research as a metaheuristic method where the algorithm can support K-means in reaching global optimum function. Besides, the stratified sampling is also used in this research, where the sampling functions by dividing the population into homogeneous areas using stratification variables. The validation value of the proposed method with iris dataset is 0.417, while the K-means is only 0.662.

show abstract

“…The accuracy of the Davies-Bouldin Index is used to calculate the accuracy [17]. The Davis-Bouldin method is a function of the total ratio of intra cluster dispersion to the distance between clusters.…”

Section: -3 Calculation Competency Functionmentioning

confidence: 99%

Presenting a Model for Identifying the Best Location of Melli Bank ATMS by Combining Clustering Algorithms and Particle Optimization

Shakibayinia¹,

Forootan²

2018

IJCATR

View full text Add to dashboard Cite

Abstract:The Interbank Information Exchange Network (Shetab or Acceleration) has started since 2002 and the purpose was integrating and connecting card systems of all banks in the country. Currently, the Acceleration Center has been acting as Melli bank card switch in the country, and all the banks in the country are its member. These operations cover a wide range of transactions, such as cash withdrawals, electronic purchases, fund transfers, paying bills and residual payments. Shetab center processes more than two and a half million transactions per day. At present, the amount of fees received from each network transaction is 500 to 22,000 Rial, which is considered as a fee for the client's bank as revenue and for the client bank. And it does not cost any expenses to the customer, thus banks are looking for earning revenue from this service. In this, first the list of ATMs that Melli Bank pays them service fee are considered, then by using the clustering algorithm, locations were arranged for an ATM so Melli Bank pay less fee. In this study, the combination of three K-means algorithms and particle optimization algorithm and genetic algorithm were used. Davies-Bouldin Index was used to assess clustering. Then, the proposed clustering along with another clustering algorithm was evaluated and it was shown that the proposed algorithm is performing better. 8 locations for ATM were presented in proposed clustering algorithm, which is the result of the proposed clustering.

show abstract

Linear, Deterministic, and Order-Invariant Initialization Methods for the K-Means Clustering Algorithm

Cited by 42 publications

References 96 publications

Clustering Evaluation in High-Dimensional Data

Clustering Evaluation in High-Dimensional Data

Enhancement of K-Parameter Using Hybrid Statifies Sampling and Genetic Algorithm

Presenting a Model for Identifying the Best Location of Melli Bank ATMS by Combining Clustering Algorithms and Particle Optimization

Contact Info

Product

Resources

About