Proceedings of the 14th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming 2009
DOI: 10.1145/1504176.1504212
|View full text |Cite
|
Sign up to set email alerts
|

Petascale computing with accelerators

Abstract: A trend is developing in high performance computing in which commodity processors are coupled to various types of computational accelerators. Such systems are commonly called hybrid systems. In this paper, we describe our experience developing an implementation of the Linpack benchmark for a petascale hybrid system, the LANL Roadrunner cluster built by IBM for Los Alamos National Laboratory. This system combines traditional x86-64 host processors with IBM PowerXCell™ 8i accelerator processors. The implementati… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
20
0

Year Published

2009
2009
2020
2020

Publication Types

Select...
6

Relationship

0
6

Authors

Journals

citations
Cited by 18 publications
(20 citation statements)
references
References 13 publications
0
20
0
Order By: Relevance
“…Since the same matrix between tasks can be reused, the order of the four tasks is like T 0, T 1, T 3, T 2 by using the "bounce corner turn" [18] method. When T 1 is executed, matrix A 1 does not need to be transferred, neither do B 2 for T 3 and A 2 for T 2.…”
Section: Software Pipeliningmentioning
confidence: 99%
See 1 more Smart Citation
“…Since the same matrix between tasks can be reused, the order of the four tasks is like T 0, T 1, T 3, T 2 by using the "bounce corner turn" [18] method. When T 1 is executed, matrix A 1 does not need to be transferred, neither do B 2 for T 3 and A 2 for T 2.…”
Section: Software Pipeliningmentioning
confidence: 99%
“…Using Cell accelerators [26] , in 2008 IBM built the first heterogenous petascale supercomputer called Roadrunner [18] . This system was very different than a GPU-accelerated system.…”
Section: Related Workmentioning
confidence: 99%
“…A node in such heterogeneous clusters was typically built with multicore CPUs and a single accelerator (e.g., a GPU). Recently, more and more cluster systems have started to have multiple accelerators per node to deal with large size problems [1], [2], [3]. Multiple accelerators per node may enlarge the benefits of a heterogeneous system, especially for massively data-parallel applications.…”
Section: Introductionmentioning
confidence: 99%
“…Moreover, it is usually used as a yardstick of the performance of supercomputers because the TOP500 supercomputer list [10] ranks supercomputers by their performance on the LINPACK benchmark. The LINPACK benchmark requires 2 3 n 3 þ Oðn 2 Þ double-precision floatingpoint operations to solve a system of linear equations of order n. Reducing the operation count (e.g., using the Strassen algorithm for matrix multiplication) is not allowed. Under this constraint, any optimizations can be applied to the algorithm in order to achieve the best performance for the target system.…”
Section: Introductionmentioning
confidence: 99%
“…processing elements (SPEs). The CBE is currently used in scientific computing on both large [2][3][4] and small scales [5,6] due to its high floating-point throughput. The CBE allows SIMD instructions to be used without resorting to assembly language and provides a great deal of programmer control over memory management.…”
Section: Introductionmentioning
confidence: 99%