Efficient and Fast Initialization Algorithm for K-means Clustering

Agha, Mohammed El; Ashour, Wesam M.

doi:10.5815/ijisa.2012.01.03

Cited by 34 publications

(15 citation statements)

References 13 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…To be able to classify the document pixels in three different classes, the idea was to apply the K-means clustering algorithm with K = 2 to the feature vectors of the different pixels for the first time to separate between the text and the graphic, then a second time for the separation between the pixels belonging to the images and the pixels belonging to other types of graphic. In both steps, the initialization of the cluster centers for K-means is performed using the ElAgha initialization algorithm [23].…”

Section: ) Filtering Results Classificationmentioning

confidence: 99%

Image Extracting from Ancient Arab Documents with Complex Structures

Charrada¹

2014

IJCDS

View full text Add to dashboard Cite

Section: ) Filtering Results Classificationmentioning

confidence: 99%

Image Extracting from Ancient Arab Documents with Complex Structures

Charrada¹

2014

IJCDS

View full text Add to dashboard Cite

“…On their part, El Agha & Ashour [23] claim that the following initialization strategy yields improved results. For s = 1, …, d , and i = 1, …, n , let x i ( s ) be the s -th coordinate of point x i , , and ν s = min x i ( s ).…”

Section: Related Workmentioning

confidence: 99%

Balancing effort and benefit of K-means clustering algorithms in Big Data realms

2018

View full text Add to dashboard Cite

In this paper we propose a criterion to balance the processing time and the solution quality of k-means cluster algorithms when applied to instances where the number n of objects is big. The majority of the known strategies aimed to improve the performance of k-means algorithms are related to the initialization or classification steps. In contrast, our criterion applies in the convergence step, namely, the process stops whenever the number of objects that change their assigned cluster at any iteration is lower than a given threshold. Through computer experimentation with synthetic and real instances, we found that a threshold close to 0.03n involves a decrease in computing time of about a factor 4/100, yielding solutions whose quality reduces by less than two percent. These findings naturally suggest the usefulness of our criterion in Big Data realms.

show abstract

“…For determining the centroids, several methods have been proposed such as [4], [5], [6]; however, random selection is the most commonly used. • Classification: The distance of each object to each of the cluster centroids is calculated, and the object is assigned to the cluster whose object-to-centroid distance is the smallest.…”

Section: Introductionmentioning

confidence: 99%

An improvement to the K-means algorithm oriented to big data

Pérez-Ortega¹,

Pazos²,

Hidalgo³

et al. 2015

AIP Conference Proceedings

View full text Add to dashboard Cite

The K-means clustering algorithm is widely used in several domains, because of its simplicity of implementation and interpretation. However, one of its limitations is its high computational complexity. In this work the problem of reducing the complexity of the K means algorithm is approached, in order to make possible the solution of large scale data sets like those from Big Data, without significantly degrading solution quality. To this end, a new metaheuristics is proposed, which by an early assignment of objects to clusters, significantly reduces the number of calculations of distances from objects to centroids. The approach was experimentally evaluated by solving real and synthetic datasets yielding encouraging results. Time reductions of up to 91% were obtained with respect to the standard K-means, at the expense of reducing quality by 3.2%.

show abstract

Efficient and Fast Initialization Algorithm for K-means Clustering

Cited by 34 publications

References 13 publications

Image Extracting from Ancient Arab Documents with Complex Structures

Image Extracting from Ancient Arab Documents with Complex Structures

Balancing effort and benefit of K-means clustering algorithms in Big Data realms

An improvement to the K-means algorithm oriented to big data

Contact Info

Product

Resources

About