Implementing Push-Pull Efficiently in GraphBLAS

Yang, Carl; Buluç, Aydın; Owens, John D.

doi:10.1145/3225058.3225122

Cited by 34 publications

(32 citation statements)

References 23 publications

(29 reference statements)

Supporting

Mentioning

Contrasting

Order By: Relevance

“…The sequential and parallel versions of this algorithm are deterministic and asymptotically optimal for any ordering of matrix and vector indices. The current state-of-the-art SpMmSpV-BFS approaches are only optimal if the vector indices are unordered [1,25]. It also appears that other recent SpMmSpV methods take O (mn) time overall for BFS because their masking method requires an elementwise multiplication with a dense vector or explicitly testing every vertex in each step [6,25,26].…”

Section: Theorem 1 Bfs Can Be Computed By Xmentioning

confidence: 99%

“…But there is an analysis gap on the asymptotic cost of preventing previous frontier vertices in the BFS from reappearing in the sparse vector. Masking out these frontier nonzeros was analyzed in [25] and it appears to require an elementwise multiplication with a dense masking vector which must be O (n) size to accommodate all vertices. This suggests these SpMmSpV methods with masking take O (mn) time for BFS.…”

Section: Related Workmentioning

confidence: 99%

“…The SpMmSpV algorithm for BFS in [26] tests all vertices in each step and zeros out those in the output vector that have already been reached, leading to Ω(mn) time. A masked, column-based matrix-vector method for BFS that relies on radix sorting is given in [25] but takes Ω(m log n) time. The authors allow unsorted indices to avoid the Ω(log n) factor but elementwise multiplication with the dense masking vector results in O (mn) time.…”

Section: Related Workmentioning

confidence: 99%

“…These libraries take advantage of the memory subsystem and it is this low-level interaction with hardware that enables the algebraic BFS to be faster in practice than the theoretically optimal combinatorial algorithm. Newer approaches employ a Sparse Matrix masked-Sparse Vector (SpMmSpV) multiplication [1,6,25,26]. In these methods, previously visited frontier vertices are masked out of the sparse input vector at each step.…”

Section: Introductionmentioning

confidence: 99%

“…Our submatrix multiplication method is quite simple so it is surprising that it has been overlooked [27]. Our new method can be easily integrated with existing matrix methods, and may benefit the masking techniques in the GraphBLAS library [7,10,24,25], so we expect it would provide substantial value in practical settings.…”

Section: Introductionmentioning

confidence: 99%

See 4 more Smart Citations

Optimal Algebraic Breadth-First Search for Sparse Graphs

Burkhardt

2021

ACM Trans. Knowl. Discov. Data

View full text Add to dashboard Cite

There has been a rise in the popularity of algebraic methods for graph algorithms given the development of the GraphBLAS library and other sparse matrix methods. An exemplar for these approaches is Breadth-First Search (BFS). The algebraic BFS algorithm is simply a recurrence of matrix-vector multiplications with the n × n adjacency matrix, but the many redundant operations over nonzeros ultimately lead to suboptimal performance. Therefore an optimal algebraic BFS should be of keen interest especially if it is easily integrated with existing matrix methods. Current methods, notably in the GraphBLAS, use a Sparse Matrix masked-Sparse Vector multiplication in which the input vector is kept in a sparse representation in each step of the BFS, and nonzeros in the vector are masked in subsequent steps. This has been an area of recent research in GraphBLAS and other libraries. While in theory, these masking methods are asymptotically optimal on sparse graphs, many add work that leads to suboptimal runtime. We give a new optimal, algebraic BFS for sparse graphs, thus closing a gap in the literature. Our method multiplies progressively smaller submatrices of the adjacency matrix at each step. Let n and m refer to the number of vertices and edges, respectively. On a sparse graph, our method takes O ( n ) algebraic operations as opposed to O ( m ) operations needed by theoretically optimal sparse matrix approaches. Thus, for sparse graphs, it matches the bounds of the best-known sequential algorithm, and on a Parallel Random Access Machine, it is work-optimal. Our result holds for both directed and undirected graphs. Compared to a leading GraphBLAS library, our method achieves up to 24x faster sequential time, and for parallel computation, it can be 17x faster on large graphs and 12x faster on large-diameter graphs.

show abstract