The I/O Complexity of Sparse Matrix Dense Matrix Multiplication

Greiner, Gero; Jacob, Riko

doi:10.1007/978-3-642-12200-2_14

Cited by 13 publications

(11 citation statements)

References 5 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Greiner and Jacob have proven theoretically [26] that as the number of nonzeroes per row exceeds some hardware threshold, namely m M where m is the number [20]. cuBLAS sgemm is a dense-dense matrix multiplication function from a vendor-shipped library.…”

Section: Discussionmentioning

confidence: 99%

Design Principles for Sparse Matrix Multiplication on the GPU

Yang

Buluç

Owens

2018

Euro-Par 2018: Parallel Processing

View full text Add to dashboard Cite

We implement two novel algorithms for sparse-matrix densematrix multiplication (SpMM) on the GPU. Our algorithms expect the sparse input in the popular compressed-sparse-row (CSR) format and thus do not require expensive format conversion. While previous SpMM work concentrates on thread-level parallelism, we additionally focus on latency hiding with instruction-level parallelism and load-balancing. We show, both theoretically and experimentally, that the proposed SpMM is a better fit for the GPU than previous approaches. We identify a key memory access pattern that allows efficient access into both input and output matrices that is crucial to getting excellent performance on SpMM. By combining these two ingredients-(i) merge-based loadbalancing and (ii) row-major coalesced memory access-we demonstrate a 4.1× peak speedup and a 31.7% geomean speedup over state-of-the-art SpMM implementations on real-world datasets.

show abstract

Section: Discussionmentioning

confidence: 99%

Design Principles for Sparse Matrix Multiplication on the GPU

Yang

Buluç

Owens

2018

Euro-Par 2018: Parallel Processing

View full text Add to dashboard Cite

show abstract

“…The algorithm also takes sparse matrices as input, and never explicitly computes a multiplication of two n × n matrices. Therefore, for input feature dimension and hidden dimension d n, time and space complexity of DIMPA (and implicitly of DIGRAC) is O(|E|dh + 2ndK) and O(2|E| + 4nd + nK), respectively [20,18]. For large-scale networks, DIMPA is amenable to a minibatch version using neighborhood sampling, similar to the minibatch forward propagation algorithm in [19,31].…”

Section: B8 Complexity Analysismentioning

confidence: 99%

DIGRAC: Digraph Clustering Based on Flow Imbalance

Reinert

Cucuringu

2021

Preprint

View full text Add to dashboard Cite

Node clustering is a powerful tool in the analysis of networks. Here, we introduce a graph neural network framework with a novel scalable Directed Mixed Path Aggregation (DIMPA) scheme to obtain node embeddings for directed networks in a self-supervised manner, including a novel probabilistic imbalance loss. The method is end-to-end in combining embedding generation and clustering without an intermediate step. In contrast to standard approaches in the literature, in this paper, directionality is not treated as a nuisance, but rather contains the main signal. In particular, we leverage the recently introduced cut flow imbalance measure, which is tightly related to directionality; cut flow imbalance is optimized without resorting to spectral methods or cluster labels. Experimental results on synthetic data, in the form of directed stochastic block models and real-world data at different scales, demonstrate that our method attains state-of-the-art results on directed graph clustering, for a wide range of noise and sparsity levels and graph structures.Preprint. Under review.

show abstract

“…Figure 1 gives an overview. [19]. For large networks, SIMPA is amenable to a more scalable version following [18].…”

Section: Supervised Lossmentioning

confidence: 99%

“…The algorithm also takes sparse matrices as input, and sparsity is maintained throughout. Therefore, for input feature dimension d in and hidden dimension d, if d = max(d in , d) n, time and space complexity of SIMPA, and implicitly SSSNET, is O(|E|d h 2 + 4nd K) and O(4|E| + 10nd + nK), respectively [23,19]. When the network is large, SIMPA is amendable to a minibatch version using neighborhood sampling, similar to the minibatch forward propagation algorithm in [21,36].…”

Section: Implementation Detailsmentioning

confidence: 99%

SSSNET: Semi-Supervised Signed Network Clustering

He¹,

Reinert²,

Wang³

et al. 2021

Preprint

View full text Add to dashboard Cite

Node embeddings are a powerful tool in the analysis of networks; yet, their full potential for the important task of node clustering has not been fully exploited. In particular, most state-of-the-art methods generating node embeddings of signed networks focus on link sign prediction, and those that pertain to node clustering are usually not graph neural network (GNN) methods. Here, we introduce a novel probabilistic balanced normalized cut loss for training nodes in a GNN framework for semi-supervised signed network clustering, called SSSNET. The method is end-to-end in combining embedding generation and clustering without an intermediate step; it has node clustering as main focus, with an emphasis on polarization effects arising in networks. The main novelty of our approach is a new take on the role of social balance theory for signed network embeddings. The standard heuristic for justifying the criteria for the embeddings hinges on the assumption that an "enemy's enemy is a friend". Here, instead, a neutral stance is assumed on whether or not the enemy of an enemy is a friend. Experimental results on various data sets, including a synthetic signed stochastic block model, a polarized version of it, and real-world data at different scales, demonstrate that SSSNET can achieve comparable or better results than state-of-the-art spectral clustering methods, for a wide range of noise and sparsity levels. SSSNET complements existing methods through the possibility of including exogenous information, in the form of node-level features or labels.

show abstract

The I/O Complexity of Sparse Matrix Dense Matrix Multiplication

Cited by 13 publications

References 5 publications

Design Principles for Sparse Matrix Multiplication on the GPU

Design Principles for Sparse Matrix Multiplication on the GPU

DIGRAC: Digraph Clustering Based on Flow Imbalance

SSSNET: Semi-Supervised Signed Network Clustering

Contact Info

Product

Resources

About