2004
DOI: 10.1023/b:mach.0000033113.59016.96
|View full text |Cite
|
Sign up to set email alerts
|

Clustering Large Graphs via the Singular Value Decomposition

Abstract: Abstract.We consider the problem of partitioning a set of m points in the n-dimensional Euclidean space into k clusters (usually m and n are variable, while k is fixed), so as to minimize the sum of squared distances between each point and its cluster center. This formulation is usually the objective of the k-means clustering algorithm (Kanungo et al. (2000)). We prove that this problem in NP-hard even for k = 2, and we consider a continuous relaxation of this discrete problem: find the k-dimensional subspace … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

2
262
0
7

Year Published

2005
2005
2015
2015

Publication Types

Select...
4
3
1

Relationship

0
8

Authors

Journals

citations
Cited by 378 publications
(271 citation statements)
references
References 30 publications
2
262
0
7
Order By: Relevance
“…This approach is a greedy algorithm that tries to solve the problem of maximizing σ k for each k. But this problem is known to be NP-hard: even for a given k, maximizing σ k is the NP-hard "K-Median clustering problem" [10,8] for K = (n − k) clusters. The existing approximation algorithms [10,8] are exponential with the number of clusters to find and unsuitable for our purpose.…”
Section: The Algorithmmentioning
confidence: 99%
See 1 more Smart Citation
“…This approach is a greedy algorithm that tries to solve the problem of maximizing σ k for each k. But this problem is known to be NP-hard: even for a given k, maximizing σ k is the NP-hard "K-Median clustering problem" [10,8] for K = (n − k) clusters. The existing approximation algorithms [10,8] are exponential with the number of clusters to find and unsuitable for our purpose.…”
Section: The Algorithmmentioning
confidence: 99%
“…The existing approximation algorithms [10,8] are exponential with the number of clusters to find and unsuitable for our purpose. So for each pair of adjacent communities {C 1 , C 2 }, we compute the variation ∆σ(C 1 , C 2 ) of σ if we would merge C 1 and C 2 into a new community…”
Section: The Algorithmmentioning
confidence: 99%
“…In the case of "power-law" networks it was shown in [32] that the spectral counting of triangles can be efficient due to their special spectral properties and [33] extended this idea using the randomized algorithm by [12] by proposing a simple biased node sampling. This algorithm can be viewed as a special case of a streaming algorithm, since there exist algorithms, e.g., [29], that perform a constant number of passes over the non-zero elements of the matrix to produce a good low rank matrix approximation.…”
Section: Existing Workmentioning
confidence: 99%
“…It should be noted, however, that there are of course other options, and that alternative measures can indeed be found in literature. Anyway, as a reasonable feature of (11), note that it is a normalized measure between 0 and 1, where the latter value is assumed for perfectly identical structures. This property is often violated for fuzzifications of standard (relative) evaluation measures such as, e.g., those based on the comparison of coincidence matrices.…”
Section: Similarity Between Cluster Modelsmentioning
confidence: 99%