Breadth first search vectorization on the Intel Xeon Phi

Paredes, Mireya; Riley, Graham; Luján, Mikel

doi:10.1145/2903150.2903180

Cited by 9 publications

(9 citation statements)

References 26 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Reference Approach optimisation vectorisation 2012 Saule and Ç atalyürek [17] top-down no optimisation automatic 2013 Gao et al [19] top-down bitmaps intrinsics 2013 Stanic et al [ The key contribution of this work builds on the studies carried out by Gao et al in [19] and [5]. In the first study, they present the vectorisation of the top-down BFS algorithm arXiv:1704.02259v2 [cs.DC] 20 Apr 2017 using vector intrinsic functions, which was outperformed by [15], which clarified the impact of prefetching, thread affinity and vector unit usage rate. The second study is related with the vectorisation of the hybrid BFS algorithm.…”

Section: Yearmentioning

confidence: 99%

“…Particularly, the vectorisation of the hybrid involves the vectorised version of both algorithms. The vectorisation of the top-down algorithm is described and analysed in [15], whereas the vectorisation of the bottom-up is described further in Section 5. Table 2 shows an example of the switching points to swap between the top-down and the bottom-up approaches of the , , , ←getCounters() 14: swap(in, out) 15: ← 0 16: end while hybrid BFS algorithm for a graph created by the Graph 500 graph generator introduced in Section 6.…”

Section: The Hybrid Bfs Algorithmmentioning

confidence: 99%

“…This paper investigates the optimisation and vectorisation of the BFS on the Xeon Phi, which is a parallel architecture containing advanced vector capabilities within the experimental framework of the Graph 500 benchmark. As a result, an novel optimised parallel version of the hybrid BFS is presented using vectorisation, building on top of the vectorised top-down BFS introduced by Paredes et al [15]. Note that CF'17, Siena, Italy 2017.…”

Section: Introductionmentioning

confidence: 99%

See 2 more Smart Citations

Vectorization of Hybrid Breadth First Search on the Intel Xeon Phi

Paredes¹,

Riley²,

Luján³

2017

Proceedings of the Computing Frontiers Conference

Self Cite

View full text Add to dashboard Cite

The Breadth-First Search (BFS) algorithm is an important building block for graph analysis of large datasets. The BFS parallelisation has been shown to be challenging because of its inherent characteristics, including irregular memory access patterns, data dependencies and workload imbalance, that limit its scalability. We investigate the optimisation and vectorisation of the hybrid BFS (a combination of top-down and bottom-up approaches for BFS) on the Xeon Phi, which has advanced vector processing capabilities. The results show that our new implementation improves by 33%, for a one million vertices graph, compared to the state-of-the-art.

show abstract

Section: Yearmentioning

confidence: 99%

Section: The Hybrid Bfs Algorithmmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

Vectorization of Hybrid Breadth First Search on the Intel Xeon Phi

Paredes¹,

Riley²,

Luján³

2017

Proceedings of the Computing Frontiers Conference

Self Cite

View full text Add to dashboard Cite

show abstract

“…(3) Low-level single node: These are single-node native platforms which are performance-oriented and include architecture-dependent optimization [47]. As a consequence, such implementations are able to exploit better the hardware capability, leveraging good performance, but the programming and tuning efforts are high and, more importantly, portability is difficult to achieve.…”

Section: Pad: Graph-processing Platformsmentioning

confidence: 99%

Exploring HPC and Big Data Convergence: A Graph Processing Study on Intel Knights Landing

Uta

Vărbănescu

Musaafir

et al. 2018

2018 IEEE International Conference on Cluster Computing (CLUSTER)

View full text Add to dashboard Cite

The question "Can big data and HPC infrastructure converge?" has important implications for many operators and clients of modern computing. However, answering it is challenging. The hardware is currently different, and fast evolving: big data uses machines with modest numbers of fat cores per socket, large caches, and much memory, whereas HPC uses machines with larger numbers of (thinner) cores, non-trivial NUMA architectures, and fast interconnects. In this work, we investigate the convergence of big data and HPC infrastructure for one of the most challenging application domains, the highly irregular graph processing. We contrast through a systematic, experimental study of over 300,000 core-hours the performance of a modern multicore, Intel Knights Landing (KNL) and of traditional big data hardware, in processing representative graph workloads using state-of-the-art graph analytics platforms. The experimental results indicate KNL is convergence-ready, performance-wise, but only after extensive and expert-level tuning of software and hardware parameters.

show abstract

“…There exist some BFS schemes for GPUs [31], [28], [11], [39], [14], [18]. However, they usually underutilize the available SIMD and vectorization parallelism as they focus on work-optimal traditional BFS or BFS based on SpMSpV that use fine-grained irregular memory accesses.…”

Section: Introductionmentioning

confidence: 99%

SlimSell: A Vectorizable Graph Representation for Breadth-First Search

Besta

Marending

Solomonik

et al. 2017

2017 IEEE International Parallel and Distributed Processing Symposium (IPDPS)

View full text Add to dashboard Cite

Vectorization and GPUs will profoundly change graph processing. Traditional graph algorithms tuned for 32-or 64-bit based memory accesses will be inefficient on architectures with 512-bit wide (or larger) instruction units that are already present in the Intel Knights Landing (KNL) manycore CPU. Anticipating this shift, we propose SlimSell: a vectorizable graph representation to accelerate Breadth-First Search (BFS) based on sparse-matrix dense-vector (SpMV) products. SlimSell extends and combines the state-of-the-art SIMD-friendly Sell-C-σ matrix storage format with tropical, real, boolean, and sel-max semiring operations. The resulting design reduces the necessary storage (by up to 50%) and thus pressure on the memory subsystem. We augment SlimSell with the SlimWork and SlimChunk schemes that reduce the amount of work and improve load balance, further accelerating BFS. We evaluate all the schemes on Intel Haswell multicore CPUs, the state-of-the-art Intel Xeon Phi KNL manycore CPUs, and NVIDIA Tesla GPUs. Our experiments indicate which semiring offers highest speedups for BFS and illustrate that SlimSell accelerates a tuned Graph500 BFS code by up to 33%. This work shows that vectorization can secure high-performance in BFS based on SpMV products; the proposed principles and designs can be extended to other graph algorithms.

show abstract

Breadth first search vectorization on the Intel Xeon Phi

Cited by 9 publications

References 26 publications

Vectorization of Hybrid Breadth First Search on the Intel Xeon Phi

Vectorization of Hybrid Breadth First Search on the Intel Xeon Phi

Exploring HPC and Big Data Convergence: A Graph Processing Study on Intel Knights Landing

SlimSell: A Vectorizable Graph Representation for Breadth-First Search

Contact Info

Product

Resources

About