2011
DOI: 10.5120/3573-4930
|View full text |Cite
|
Sign up to set email alerts
|

Initializing KMeans Clustering Algorithm using Statistical Information

Abstract: K-means clustering algorithm is one of the best known algorithms used in clustering; nevertheless it has many disadvantages as it may converge to a local optimum, depending on its random initialization of prototypes. We will propose an enhancement to the initialization process of k-means, which depends on using statistical information from the data set to initialize the prototypes. We show that our algorithm gives valid clusters, and that it decreases error and time.

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
10
0

Year Published

2012
2012
2023
2023

Publication Types

Select...
5
4
1

Relationship

0
10

Authors

Journals

citations
Cited by 25 publications
(10 citation statements)
references
References 13 publications
0
10
0
Order By: Relevance
“…Further, the greater the similarity within a group and the greater the difference between groups, the better or more distinct is the clustering [19]. The standard k-means clustering algorithm (SKMC) is one of the best-known and most popular algorithms used in clustering, and it seeks an optimal partition of the data by using different criteria [20]- [21]. However, the results obtained from the SKMC highly depend on the initialization of the clustering parameters; in other words, different initializations may produce different results.…”
Section: A Improved Bisecting K-means Clustering Algorithmmentioning
confidence: 99%
“…Further, the greater the similarity within a group and the greater the difference between groups, the better or more distinct is the clustering [19]. The standard k-means clustering algorithm (SKMC) is one of the best-known and most popular algorithms used in clustering, and it seeks an optimal partition of the data by using different criteria [20]- [21]. However, the results obtained from the SKMC highly depend on the initialization of the clustering parameters; in other words, different initializations may produce different results.…”
Section: A Improved Bisecting K-means Clustering Algorithmmentioning
confidence: 99%
“…Some works have focused on finding the best value for the initial number of clusters k and the best way of choosing the initial centroids as described in [8], [9], [10], [11], [12], [13], [14], [15]. Other research works are focused on defining the best stopping criterion in order to avoid excessive iterations considering that K-Means converges at a local minimum [16].…”
Section: Related Workmentioning
confidence: 99%
“…In [50] a method is proposed, which is based on a sample of the data set for which an average is calculated. Next, the objects whose distance is larger than the average are identified, and a distance-between-objects criterion is applied for selecting the objects that will constitute the initial objects.…”
Section: Introduction To Data Science and Machine Learningmentioning
confidence: 99%