Proceedings of 2011 International Conference for High Performance Computing, Networking, Storage and Analysis 2011
DOI: 10.1145/2063384.2063471
|View full text |Cite
|
Sign up to set email alerts
|

Parallel breadth-first search on distributed memory systems

Abstract: Data-intensive, graph-based computations are pervasive in several scientific applications, and are known to to be quite challenging to implement on distributed memory systems. In this work, we explore the design space of parallel algorithms for Breadth-First Search (BFS), a key subroutine in several graph algorithms. We present two highly-tuned parallel approaches for BFS on large parallel systems: a levelsynchronous strategy that relies on a simple vertex-based partitioning of the graph, and a two-dimensional… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
136
0

Year Published

2013
2013
2022
2022

Publication Types

Select...
5
3
1

Relationship

0
9

Authors

Journals

citations
Cited by 149 publications
(136 citation statements)
references
References 35 publications
0
136
0
Order By: Relevance
“…CombBLAS curve is mostly flat (only 9% deviation) due to its in-core computational bottlenecks, while SEJITS+KDT and CombBLAS shows higher deviations (54% and 62%, respectively) from a perfect flat line. However, these deviations are expected on a large scale BFS run and are experienced on similar architectures [14].…”
Section: Parallel Scalingmentioning
confidence: 81%
“…CombBLAS curve is mostly flat (only 9% deviation) due to its in-core computational bottlenecks, while SEJITS+KDT and CombBLAS shows higher deviations (54% and 62%, respectively) from a perfect flat line. However, these deviations are expected on a large scale BFS run and are experienced on similar architectures [14].…”
Section: Parallel Scalingmentioning
confidence: 81%
“…Both 1D and 2D algorithms can be enhanced by in-node multithreading, resulting in one MPI process per chip instead of one MPI process per core, which will reduce the number of communicating parties. Large scale experiments of 1D versus 2D show that the 2D approach's communication costs are lower than the respective 1D approach's, with or without in-node multithreading [6]. The study also shows that in-node multithreading gives a further performance boost by decreasing network contention.…”
Section: Parallel Top-down Bfsmentioning
confidence: 86%
“…To yield a fast direction-optimizing BFS implementation, our bottom-up implementation is combined with an existing performant top-down implementation [6]. We provide a parallel complexity analysis of the new algorithm in terms of the bandwidth and synchronization (latency) costs in Section V. Section VI gives details about our directionoptimizing approach that combines top-down and bottom-up steps.…”
Section: Introductionmentioning
confidence: 99%
“…Buluc et al [5] conducted extensive performance studies of partitioning schemes for BFS on large-scale machines at LNBL, Hopper (6,392 nodes) and Franklin (9,660 nodes), comparing 1-D and 2-D partitioning strategies. Satish et al [10] proposed an efficient BFS algorithm on commodity supercomputing clusters consisting of Intel CPU and the Infiniband Network.…”
Section: Related Workmentioning
confidence: 99%