2012 IEEE International Conference on Cluster Computing 2012
DOI: 10.1109/cluster.2012.29
|View full text |Cite
|
Sign up to set email alerts
|

Evaluation and Optimization of Breadth-First Search on NUMA Cluster

Abstract: Graph is widely used in many areas. Breadth-First Search (BFS), a key subroutine for many graph analysis algorithms, has become the primary benchmark for Graph500 ranking. Due to the high communication cost of BFS, multisocket nodes with large memory capacity (NUMA) are supposed to reduce network pressure. However, the longer latency to remote memory may cause problem if not treated well. In this work, we first demonstrate that simply spawning and binding one MPI process for each socket can achieve the best pe… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1

Citation Types

0
3
0

Year Published

2014
2014
2022
2022

Publication Types

Select...
3
2
1

Relationship

0
6

Authors

Journals

citations
Cited by 6 publications
(3 citation statements)
references
References 43 publications
0
3
0
Order By: Relevance
“…As shown in Figure 4, the Matrix-2000+ CPUs adopt a regional autonomous parallel architecture composed of several regions. Each region can be viewed as a functionally-independent SN, which has SVE (Scalable Vector Extension) configured in hardware that can be used to accelerate BFS [12][13][14][15][16][17]. Rather than using a fixed vector length, SVE allows Matrix-2000+ to choose the most appropriate vector length for applications, ranging from 128 bits up to 1024 bits per vector register file.…”
Section: Bfs With Svementioning
confidence: 99%
“…As shown in Figure 4, the Matrix-2000+ CPUs adopt a regional autonomous parallel architecture composed of several regions. Each region can be viewed as a functionally-independent SN, which has SVE (Scalable Vector Extension) configured in hardware that can be used to accelerate BFS [12][13][14][15][16][17]. Rather than using a fixed vector length, SVE allows Matrix-2000+ to choose the most appropriate vector length for applications, ranging from 128 bits up to 1024 bits per vector register file.…”
Section: Bfs With Svementioning
confidence: 99%
“…Note that our framework is a generic graph processing framework that enables users to develop multiple applications, including BFS and SSSP, and applies optimizations in an application-agnostic way. While, the codes we compete against in Graph500 are developed for these specific applications (as published in the corresponding publications [13,38,40,41]).…”
Section: Graph500 Submissionsmentioning
confidence: 99%
“…Several studies focused on the performance on shared-memory nodes, for example minimizing the memory footprint of frequently accessed data (e.g. using bitmaps) (Agarwal et al, 2010;Checconi et al, 2012), or reducing intrasocket communication (Cui et al, 2012). Other work has been done on distributed BFS managed partitioning as a way to control load balance and communication (Chow et al, 2005;Yoo et al, 2005) or adopting sparse linear algebra representations to reduce the storage requirements (Gilbert et al, 2007).…”
Section: Related Workmentioning
confidence: 99%