2016 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW) 2016
DOI: 10.1109/ipdpsw.2016.79
|View full text |Cite
|
Sign up to set email alerts
|

A High Performance Implementation of Spectral Clustering on CPU-GPU Platforms

Abstract: Abstract-Spectral clustering is one of the most popular graph clustering algorithms, which achieves the best performance for many scientific and engineering applications. However, existing implementations in commonly used software platforms such as Matlab and Python do not scale well for many of the emerging Big Data applications. In this paper, we present a fast implementation of the spectral clustering algorithm on a CPU-GPU heterogeneous platform. Our implementation takes advantage of the computational powe… Show more

Help me understand this report
View preprint versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

0
14
0

Year Published

2017
2017
2022
2022

Publication Types

Select...
3
3
2

Relationship

0
8

Authors

Journals

citations
Cited by 17 publications
(14 citation statements)
references
References 27 publications
0
14
0
Order By: Relevance
“…matching) problems are wellknown examples. Although these problems are NP-hard, existing relaxation methods provide good approximate solutions that can be scaled to large graphs [3], [4], especially with the aid of high performance computing hardware platform such as massively parallel CPUs and GPUs. For example, the 10th DIMACS Implementation Challenge [5] resulted in substantial participation in the graph partition problem, mostly with solutions based on modularity maximization.…”
Section: Introductionmentioning
confidence: 99%
“…matching) problems are wellknown examples. Although these problems are NP-hard, existing relaxation methods provide good approximate solutions that can be scaled to large graphs [3], [4], especially with the aid of high performance computing hardware platform such as massively parallel CPUs and GPUs. For example, the 10th DIMACS Implementation Challenge [5] resulted in substantial participation in the graph partition problem, mostly with solutions based on modularity maximization.…”
Section: Introductionmentioning
confidence: 99%
“…We then build a k -nearest neighbor graph based on the resulting correlation matrix. This is followed by computing the Laplacian eigenmaps on the sparse correlation matrix, which includes a sparse eigenvector decomposition that has a fast implementation, and can be easily boosted by GPU platforms as shown in Jin and Jaja (2016).…”
Section: Discussionmentioning
confidence: 99%
“…This is determined by the size of the task in terms of k and d where no further performance gains are possible by adding more nodes. The number of nodes varies from just one node for a single processing unit [26], [29] to 128 nodes in [35]. We report results against a heterogeneous node based approach running a custom implementation of parallel k-means on ten heterogeneous nodes, each node consisting of an NVIDIA Tesla K20M GPU with two Intel Xeon E5-2620 CPUs [35].…”
Section: Comparison With Other Architecturesmentioning
confidence: 99%
“…We report results against a heterogeneous node based approach running a custom implementation of parallel k-means on ten heterogeneous nodes, each node consisting of an NVIDIA Tesla K20M GPU with two Intel Xeon E5-2620 CPUs [35]. Further, we compare against two GPU based implementations running on an NVIDIA Tesla K20M GPU and an NVIDIA Tesla K20C GPU respectively [4], [26], an FPGA based approach running a custom parallel k-means implementation on Xilinx ZC706 FPGA [29], and a multi-core processor based approach running a custom implementation of parallel k-means on 8-core Intel i7-3770k processor [15]. The proposed approach running on the Sunway Taihu-Light supercomputer achieves more than 100x speedup over the high-performance heterogeneous nodes based approach, between 50x-70x speedup than those single GPU based approaches, and 31x speedup over multi-core CPU based approach on their largest solvable workload sizes.…”
Section: Comparison With Other Architecturesmentioning
confidence: 99%