How Fast Is the k-Means Method?

Har-Peled, Sariel; Sadri, Bardia

doi:10.1007/s00453-004-1127-9

Cited by 107 publications

(70 citation statements)

References 11 publications

Supporting

Mentioning

Contrasting

Unclassified

Order By: Relevance

“…All the nodes of the system are continuously organized into clusters computed through the k-means algorithm exclusively run by the management node, which is a clear impediment to the scalability of their approach. Other works aim at minimizing the processing cost for continuous monitoring [13], [9], [14] in the light of the theoretical results of [5], however similarly to [15], all these approaches suffer from a centralized handling of the clustering process. Recently, Choffnes et al [2] have proposed to leverage structured peer-to-peer architectures (i.e., Distributed Hashing Tables) to guarantee efficient and scalable monitoring management.…”

Section: Related Workmentioning

confidence: 99%

Anomaly Characterization in Large Scale Networks

Anceaume

Busnel

Merrer

et al. 2014

2014 44th Annual IEEE/IFIP International Conference on Dependable Systems and Networks

View full text Add to dashboard Cite

Abstract-The context of this work is the online characterization of anomalies in large scale systems. In particular, we address the following question: Given two successive configurations of the system, can we distinguish massive anomalies from isolated ones, the former ones impacting a large number of nodes while the second ones affect solely a small number of them, or even a single one? The rationale of this question is twofold. First, from a theoretical point of view, we characterize anomalies with respect to their neighborhood, and we show that there are anomaly scenarios for which isolated and massive anomalies are indistinguishable from an global observer point of view. We then relax the definition of this problem by introducing unresolved configurations, and exhibit necessary and sufficient conditions that allows any node to determine the type of anomaly it has been impacted by. This condition only depends on the close neighborhood of each node and thus is locally computable. We present an algorithm that implements this condition. We show through extensive simulations the performance of our algorithm. From a practical point of view, distinguishing isolated anomalies from massive ones is of utmost importance for networks providers. For instance, regarding Internet service providers that operate millions of home gateways, it would be very interesting to have procedures that allow gateways to self distinguish whether their dysfunction is caused by network-level anomalies or by their own hardware or software, and to notify the service provider only in the latter case.

show abstract

Section: Related Workmentioning

confidence: 99%

Anomaly Characterization in Large Scale Networks

Anceaume

Busnel

Merrer

et al. 2014

2014 44th Annual IEEE/IFIP International Conference on Dependable Systems and Networks

View full text Add to dashboard Cite

show abstract

“…Har-Peled and Sadri [11] and Arthur and Vassilvitskii [5,4] examine the question of how quickly this algorithm and its variants converge to a local optimum. Lloyd's algorithm also does not provide any significant guarantee about how well the solution that it computes approximates the optimal solution.…”

Section: Introductionmentioning

confidence: 99%

The Planar k-Means Problem is NP-Hard

Mahajan

Nimbhorkar

Varadarajan

2009

Lecture Notes in Computer Science

348

269

View full text Add to dashboard Cite

show abstract

“…Other bounds are known for the special case d = 1. Namely, for the one-dimensional case, Har-Peled and Sadri [9] provided a worst-case lower bound of Ω(n), and showed an upper bound of O(nΔ 2 ), where Δ is the spread of the point set (i.e., the ratio between the largest and the smallest pairwise distance). They also conjectured that k-means might run in time polynomial in n and Δ for any d.…”

Section: Introductionmentioning

confidence: 99%

“…Arthur and Vassilvitskii [2] showed that k-means can run for super-polynomially many iterations, improving the best known lower bound from Ω(n) [10] Also they show that their construction can be modified to have low spread, disproving the aforementioned conjecture in [9] for d = Ω( √ n). A more recent line of work that aims to close the gap between practical and theoretical performance makes use of the smoothed analysis introduced by Spielman and Teng [15].…”

Section: Introductionmentioning

confidence: 99%