2021
DOI: 10.1109/tbdata.2019.2907985
|View full text |Cite
|
Sign up to set email alerts
|

Fast Communication-Efficient Spectral Clustering over Distributed Data

Abstract: The last decades have seen a surge of interests in distributed computing thanks to advances in clustered computing and big data technology. Existing distributed algorithms typically assume all the data are already in one place, and divide the data and conquer on multiple machines. However, it is increasingly often that the data are located at a number of distributed sites, and one wishes to compute over all the data with low communication overhead. For spectral clustering, we propose a novel framework that ena… Show more

Help me understand this report
View preprint versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1

Citation Types

0
2
0

Year Published

2021
2021
2024
2024

Publication Types

Select...
3
1
1

Relationship

1
4

Authors

Journals

citations
Cited by 6 publications
(2 citation statements)
references
References 61 publications
0
2
0
Order By: Relevance
“…Another line of closely related work are those under the term "learning over inherently distributed data" [45,46]. Instead of dividing the data, these work deal with situations where the data are already distributed, i.e., stored at a number of distributed machines as a result of business operation or diverse data collection channels.…”
Section: Related Workmentioning
confidence: 99%
See 1 more Smart Citation
“…Another line of closely related work are those under the term "learning over inherently distributed data" [45,46]. Instead of dividing the data, these work deal with situations where the data are already distributed, i.e., stored at a number of distributed machines as a result of business operation or diverse data collection channels.…”
Section: Related Workmentioning
confidence: 99%
“…We will use recursive random projections [13,44] to produce a compressed signature for each partition. The idea of recursive random projections has been successfully applied in fast approximate spectral clustering [43], computing over distributed data [45,46], and other procedures. Since our approach is a divide-and-conquer method with a representation compression, we refer to it as divide-compress-and-conquer, or DC 2 in short.…”
Section: Introductionmentioning
confidence: 99%