2006 International Conference on Parallel Processing (ICPP'06)
DOI: 10.1109/icpp.2006.34
|View full text |Cite
|
Sign up to set email alerts
|

Designing Multithreaded Algorithms for Breadth-First Search and st-connectivity on the Cray MTA-2

Abstract: Abstract

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

1
113
1

Publication Types

Select...
5
3
1

Relationship

0
9

Authors

Journals

citations
Cited by 149 publications
(115 citation statements)
references
References 35 publications
1
113
1
Order By: Relevance
“…The graphs also shows an important property of the new Nehalem processors: we can hide the memory latency by keeping a number of read requests in flight, as traditionally done by multi-threaded architectures [16], [15]. Surprisingly, with a simple software pipelining strategy we can increase by a factor of eight the number of transactions per second: for example, with a working set of 8MB, the memory subsystem can satisfy up to 160 millions reads per second, and with 2 GB we can achieve 40 millions of random reads per second.…”
Section: System Architecture and Experimental Platformsmentioning
confidence: 99%
See 1 more Smart Citation
“…The graphs also shows an important property of the new Nehalem processors: we can hide the memory latency by keeping a number of read requests in flight, as traditionally done by multi-threaded architectures [16], [15]. Surprisingly, with a simple software pipelining strategy we can increase by a factor of eight the number of transactions per second: for example, with a working set of 8MB, the memory subsystem can satisfy up to 160 millions reads per second, and with 2 GB we can achieve 40 millions of random reads per second.…”
Section: System Architecture and Experimental Platformsmentioning
confidence: 99%
“…A good amount of literature deals with the design of BFS solutions, either based on commodity processors [11], [12] or special purpose hardware [13], [14], [15], [16]. Some recent publications describe successful parallelization strategies of list ranking [17] and phylogenetic trees on the Cell BE [18].…”
Section: Introductionmentioning
confidence: 99%
“…On massively multithreaded systems, Bader and Madduri [23] introduce a fine-grained implementation on the Cray MTA-2 system using the level synchronous approach, achieving good scaling on the 40 processor MTA-2. Mizell and Maschhoff [24] improve and port this algorithm to the Cray XMT, the successor to the MTA-2.…”
Section: Related Workmentioning
confidence: 99%
“…GPU implementation of FW for smaller graphs is given in [8] and for larger graphs shared memory and cache efficient GPU implementations for APSP using FW are given in [16] [9].To further enhance the performance some optimization techniques like tiling, loop unrolling and SIMD vectorization can be used.…”
Section: Problem Time Complexitymentioning
confidence: 99%