2008
DOI: 10.1007/978-3-540-89894-8_6
|View full text |Cite
|
Sign up to set email alerts
|

Optimization of BLAS on the Cell Processor

Abstract: The unique architecture of the heterogeneous multi-core Cell processor offers great potential for high performance computing. It offers features such as high memory bandwidth using DMA, user managed local stores and SIMD architecture. In this paper, we present strategies for leveraging these features to develop a high performance BLAS library. We propose techniques to partition and distribute data across SPEs for handling DMA efficiently. We show that suitable pre-processing of data leads to significant perfor… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1

Citation Types

0
2
0

Year Published

2009
2009
2023
2023

Publication Types

Select...
3
2

Relationship

0
5

Authors

Journals

citations
Cited by 7 publications
(2 citation statements)
references
References 11 publications
0
2
0
Order By: Relevance
“…Notice that, in the reality, many other architectural details affect the optimal block size, but the size of 64 × 64 is usually selected as the optimal block size for porting the dense linear algebra problems to Cell/B.E. processor [9], [22].…”
Section: ) Optimal Block Sizementioning
confidence: 99%
“…Notice that, in the reality, many other architectural details affect the optimal block size, but the size of 64 × 64 is usually selected as the optimal block size for porting the dense linear algebra problems to Cell/B.E. processor [9], [22].…”
Section: ) Optimal Block Sizementioning
confidence: 99%
“…It should be noted that the default Octave installation utilises hardware-specific BLAS libraries which are provided with the IBM Cell SDK. These libraries are highly optimised for the Cell architecture [46] and can utilise both Cell processors available in the QS22 (a total of 16 SPEs).…”
Section: Speedups and Scalabilitymentioning
confidence: 99%