Our system is currently under heavy load due to increased usage. We're actively working on upgrades to improve performance. Thank you for your patience.
2004
DOI: 10.1007/s00453-004-1127-9
|View full text |Cite
|
Sign up to set email alerts
|

How Fast Is the k-Means Method?

Abstract: We present polynomial upper and lower bounds on the number of iterations performed by the k-means method (a.k.a. Lloyd's method) for k-means clustering. Our upper bounds are polynomial in the number of points, number of clusters, and the spread of the point set. We also present a lower bound, showing that in the worst case the k-means heuristic needs to perform Ω(n) iterations, for n points on the real line and two centers. Surprisingly, the spread of the point set in this construction is polynomial. This is t… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
4
1

Citation Types

1
63
0
1

Year Published

2009
2009
2021
2021

Publication Types

Select...
5
3
1

Relationship

0
9

Authors

Journals

citations
Cited by 107 publications
(70 citation statements)
references
References 11 publications
1
63
0
1
Order By: Relevance
“…All the nodes of the system are continuously organized into clusters computed through the k-means algorithm exclusively run by the management node, which is a clear impediment to the scalability of their approach. Other works aim at minimizing the processing cost for continuous monitoring [13], [9], [14] in the light of the theoretical results of [5], however similarly to [15], all these approaches suffer from a centralized handling of the clustering process. Recently, Choffnes et al [2] have proposed to leverage structured peer-to-peer architectures (i.e., Distributed Hashing Tables) to guarantee efficient and scalable monitoring management.…”
Section: Related Workmentioning
confidence: 99%
“…All the nodes of the system are continuously organized into clusters computed through the k-means algorithm exclusively run by the management node, which is a clear impediment to the scalability of their approach. Other works aim at minimizing the processing cost for continuous monitoring [13], [9], [14] in the light of the theoretical results of [5], however similarly to [15], all these approaches suffer from a centralized handling of the clustering process. Recently, Choffnes et al [2] have proposed to leverage structured peer-to-peer architectures (i.e., Distributed Hashing Tables) to guarantee efficient and scalable monitoring management.…”
Section: Related Workmentioning
confidence: 99%
“…Har-Peled and Sadri [11] and Arthur and Vassilvitskii [5,4] examine the question of how quickly this algorithm and its variants converge to a local optimum. Lloyd's algorithm also does not provide any significant guarantee about how well the solution that it computes approximates the optimal solution.…”
Section: Introductionmentioning
confidence: 99%
“…Other bounds are known for the special case d = 1. Namely, for the one-dimensional case, Har-Peled and Sadri [9] provided a worst-case lower bound of Ω(n), and showed an upper bound of O(nΔ 2 ), where Δ is the spread of the point set (i.e., the ratio between the largest and the smallest pairwise distance). They also conjectured that k-means might run in time polynomial in n and Δ for any d.…”
Section: Introductionmentioning
confidence: 99%
“…Arthur and Vassilvitskii [2] showed that k-means can run for super-polynomially many iterations, improving the best known lower bound from Ω(n) [10] Also they show that their construction can be modified to have low spread, disproving the aforementioned conjecture in [9] for d = Ω( √ n). A more recent line of work that aims to close the gap between practical and theoretical performance makes use of the smoothed analysis introduced by Spielman and Teng [15].…”
Section: Introductionmentioning
confidence: 99%