A High Performance Implementation of Spectral Clustering on CPU-GPU Platforms

Jin, Yu; JáJá, Joseph F.

doi:10.1109/ipdpsw.2016.79

Cited by 17 publications

(14 citation statements)

References 27 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…matching) problems are wellknown examples. Although these problems are NP-hard, existing relaxation methods provide good approximate solutions that can be scaled to large graphs [3], [4], especially with the aid of high performance computing hardware platform such as massively parallel CPUs and GPUs. For example, the 10th DIMACS Implementation Challenge [5] resulted in substantial participation in the graph partition problem, mostly with solutions based on modularity maximization.…”

Section: Introductionmentioning

confidence: 99%

Streaming graph challenge: Stochastic block partition

Kao

Gadepally

Hurley

et al. 2017

2017 IEEE High Performance Extreme Computing Conference (HPEC)

View full text Add to dashboard Cite

Abstract-An important objective for analyzing realworld graphs is to achieve scalable performance on large, streaming graphs. A challenging and relevant example is the graph partition problem. As a combinatorial problem, graph partition is NP-hard, but existing relaxation methods provide reasonable approximate solutions that can be scaled for large graphs. Competitive benchmarks and challenges have proven to be an effective means to advance state-of-the-art performance and foster community collaboration. This paper describes a graph partition challenge with a baseline partition algorithm of sub-quadratic complexity. The algorithm employs rigorous Bayesian inferential methods based on a statistical model that captures characteristics of the real-world graphs. This strong foundation enables the algorithm to address limitations of well-known graph partition approaches such as modularity maximization. This paper describes various aspects of the challenge including: (1) the data sets and streaming graph generator, (2) the baseline partition algorithm with pseudocode, (3) an argument for the correctness of parallelizing the Bayesian inference, (4) different parallel computation strategies such as node-based parallelism and matrix-based parallelism, (5) evaluation metrics for partition correctness and computational requirements, (6) preliminary timing of a Python-based demonstration code and the open source C++ code, and (7) considerations for partitioning the graph in streaming fashion. Data sets and source code for the algorithm as well as metrics, with detailed documentation are available at GraphChallenge.org.

show abstract

Section: Introductionmentioning

confidence: 99%

Streaming graph challenge: Stochastic block partition

Kao

Gadepally

Hurley

et al. 2017

2017 IEEE High Performance Extreme Computing Conference (HPEC)

View full text Add to dashboard Cite

show abstract

“…We then build a k -nearest neighbor graph based on the resulting correlation matrix. This is followed by computing the Laplacian eigenmaps on the sparse correlation matrix, which includes a sparse eigenvector decomposition that has a fast implementation, and can be easily boosted by GPU platforms as shown in Jin and Jaja (2016).…”

Section: Discussionmentioning

confidence: 99%

LEICA: Laplacian eigenmaps for group ICA decomposition of fMRI data

2018

View full text Add to dashboard Cite

Independent component analysis (ICA) is a data-driven method that has been increasingly used for analyzing functional Magnetic Resonance Imaging (fMRI) data. However, generalizing ICA to multi-subject studies is non-trivial due to the high-dimensionality of the data, the complexity of the underlying neuronal processes, the presence of various noise sources, and inter-subject variability. Current group ICA based approaches typically use several forms of the Principal Component Analysis (PCA) method to extend ICA for generating group inferences. However, linear dimensionality reduction techniques have serious limitations including the fact that the underlying BOLD signal is a complex function of several nonlinear processes. In this paper, we propose an effective non-linear ICA-based model for extracting group-level spatial maps from multi-subject fMRI datasets. We use a non-linear dimensionality reduction algorithm based on Laplacian eigenmaps to identify a manifold subspace common to the group, such that this mapping preserves the correlation among voxels' time series as much as possible. These eigenmaps are modeled as linear mixtures of a set of group-level spatial features, which are then extracted using ICA. The resulting algorithm is called LEICA (Laplacian Eigenmaps for group ICA decomposition). We introduce a number of methods to evaluate LEICA using 100-subject resting state and 100-subject working memory task fMRI datasets from the Human Connectome Project (HCP). The test results show that the extracted spatial maps from LEICA are meaningful functional networks similar to those produced by some of the best known methods. Importantly, relative to state-of-the-art methods, our algorithm compares favorably in terms of the functional cohesiveness of the spatial maps generated, as well as in terms of the reproducibility of the results.

show abstract

“…This is determined by the size of the task in terms of k and d where no further performance gains are possible by adding more nodes. The number of nodes varies from just one node for a single processing unit [26], [29] to 128 nodes in [35]. We report results against a heterogeneous node based approach running a custom implementation of parallel k-means on ten heterogeneous nodes, each node consisting of an NVIDIA Tesla K20M GPU with two Intel Xeon E5-2620 CPUs [35].…”

Section: Comparison With Other Architecturesmentioning

confidence: 99%

“…We report results against a heterogeneous node based approach running a custom implementation of parallel k-means on ten heterogeneous nodes, each node consisting of an NVIDIA Tesla K20M GPU with two Intel Xeon E5-2620 CPUs [35]. Further, we compare against two GPU based implementations running on an NVIDIA Tesla K20M GPU and an NVIDIA Tesla K20C GPU respectively [4], [26], an FPGA based approach running a custom parallel k-means implementation on Xilinx ZC706 FPGA [29], and a multi-core processor based approach running a custom implementation of parallel k-means on 8-core Intel i7-3770k processor [15]. The proposed approach running on the Sunway Taihu-Light supercomputer achieves more than 100x speedup over the high-performance heterogeneous nodes based approach, between 50x-70x speedup than those single GPU based approaches, and 31x speedup over multi-core CPU based approach on their largest solvable workload sizes.…”

Section: Comparison With Other Architecturesmentioning

confidence: 99%

Large-Scale Automatic K-Means Clustering for Heterogeneous Many-Core Supercomputer

Zhao

Liu

et al. 2020

IEEE Trans. Parallel Distrib. Syst.

View full text Add to dashboard Cite

This paper presents an automatic k-means clustering solution targeting the Sunway TaihuLight supercomputer. We first introduce a multi-level parallel partition approach that not only partitions by dataflow and centroid, but also by dimension, which unlocks the potential of the hierarchical parallelism in the heterogeneous many-core processor and the system architecture of the supercomputer. The parallel design is able to process large-scale clustering problems with up to 196,608 dimensions and over 160,000 targeting centroids, while maintaining high performance and high scalability. Furthermore, we propose an automatic hyper-parameter determination process for k-means clustering, by automatically generating and executing the clustering tasks with a set of candidate hyper-parameter, and then determining the optimal hyper-parameter using a proposed evaluation method. The proposed auto-clustering solution can not only achieve high performance and scalability for problems with massive high-dimensional data, but also support clustering without sufficient prior knowledge for the number of targeted clusters, which can potentially increase the scope of k-means algorithm to new application areas.

show abstract

A High Performance Implementation of Spectral Clustering on CPU-GPU Platforms

Cited by 17 publications

References 27 publications

Streaming graph challenge: Stochastic block partition

Streaming graph challenge: Stochastic block partition

LEICA: Laplacian eigenmaps for group ICA decomposition of fMRI data

Large-Scale Automatic K-Means Clustering for Heterogeneous Many-Core Supercomputer

Contact Info

Product

Resources

About