2014
DOI: 10.1109/tpds.2014.2306193
|View full text |Cite
|
Sign up to set email alerts
|

Efficient <inline-formula><tex-math>$k$</tex-math><alternatives> <inline-graphic xlink:type="simple" xlink:href="qu-ieq1-2306193.gif"/></alternatives></inline-formula>-Means++ Approximation with MapReduce

Abstract: Abstract--means is undoubtedly one of the most popular clustering algorithms owing to its simplicity and efficiency. However, this algorithm is highly sensitive to the chosen initial centers and a proper initialization is crucial for obtaining an ideal solution. To overcome this problem, -means++ is proposed to sequentially choose the centers so as to achieve a solution that is provably close to the optimal one. However, due to its weak scalability, -means++ becomes inefficient as the size of data increases. T… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1

Citation Types

0
3
0
1

Year Published

2015
2015
2024
2024

Publication Types

Select...
9

Relationship

0
9

Authors

Journals

citations
Cited by 47 publications
(4 citation statements)
references
References 22 publications
0
3
0
1
Order By: Relevance
“…However, for example MapReduce-based implementation of K-means needs multiple MapReduce jobs for the initialization. The MapReduce K-means++ method [7] tries to address this issue, as it uses one MapReduce job to select K initial prototypes, which speeds up the initialization compared to K-means . Suggestions of parallelizing the second, search phase of K-means have been given in several papers (see, e.g., [8,9]).…”
Section: Introductionmentioning
confidence: 99%
“…However, for example MapReduce-based implementation of K-means needs multiple MapReduce jobs for the initialization. The MapReduce K-means++ method [7] tries to address this issue, as it uses one MapReduce job to select K initial prototypes, which speeds up the initialization compared to K-means . Suggestions of parallelizing the second, search phase of K-means have been given in several papers (see, e.g., [8,9]).…”
Section: Introductionmentioning
confidence: 99%
“…Metode k-means dan mini batch k-means memiliki kekurangan yang sama yaitu sensitif terhadap pusat cluster awal yang dipilih [8]. Metode k-means++ digunakan untuk mengatasi kekurangan pada k-means dan mini batch k-means dengan cara memilih pusat klaster pertama secara acak dan kemudian dipilih pusat klaster berdasarkan perhitungan jarak terdekat antara titik data dan pusat cluster yang dipilih [9].…”
Section: Pendahuluanunclassified
“…Nevertheless, the performance of k-means was improved by combining it with the k-means++ initialization algorithm, which is a subprocess that seeds the centroids [60]. To this particular aim, k-means++ outperforms k-means both by achieving the classification task quicker and by a faster convergence to a minimal intra-class (intra-cluster) variance [60,61]. The k-means++ algorithm operates as follows:…”
Section: The K-means/k-means++ Clustering Algorithmmentioning
confidence: 99%