Proceedings of the 23rd International Conference on Supercomputing 2009
DOI: 10.1145/1542275.1542285
|View full text |Cite
|
Sign up to set email alerts
|

Computer generation of fast fourier transforms for the cell broadband engine

Abstract: The Cell BE is a multicore processor with eight vector accelerators (called SPEs) that implement explicit cache management through direct memory access engines. While the Cell has an impressive floating point peak performance, programming and optimizing for it is difficult as it requires explicit memory management, multithreading, streaming, and vectorization. We address this problem for the discrete Fourier transform (DFT) by extending Spiral, a program generation system, to automatically generate highly opti… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
11
0

Year Published

2009
2009
2017
2017

Publication Types

Select...
4
2

Relationship

2
4

Authors

Journals

citations
Cited by 14 publications
(11 citation statements)
references
References 13 publications
0
11
0
Order By: Relevance
“…Besides, to explore the performance per Watt, we define a measure as data-rate-per-Watt ψ for an N -point FFT: ψ = N/t/p (samples/µs/W) = N/(t × p) (samples/µJ). (27) ψ represents the processing rate per Watt (samples/µs/W), as well as the samples processed per energy (samples/µJ). To achieve higher performance per Watt, the FFT implementation should have higher ψ.…”
Section: Evaluation Criterionsmentioning
confidence: 99%
See 1 more Smart Citation
“…Besides, to explore the performance per Watt, we define a measure as data-rate-per-Watt ψ for an N -point FFT: ψ = N/t/p (samples/µs/W) = N/(t × p) (samples/µJ). (27) ψ represents the processing rate per Watt (samples/µs/W), as well as the samples processed per energy (samples/µJ). To achieve higher performance per Watt, the FFT implementation should have higher ψ.…”
Section: Evaluation Criterionsmentioning
confidence: 99%
“…The 3.2 GHz Cell BE [27] contains one traditional PowerPC core and eight SIMD vector cores (SPEs) with 128-bit vector support. Each SPE has the fast on-chip 256 KB local memory, and data transfer between the intercore and memory-local core is performed via DMA.…”
Section: Fft Performance Analysis For 1 024-point Fftsmentioning
confidence: 99%
“…But operations on words of lengths of up to 512 bits had also been proposed (e.g., on the CDC STAR-100). In recent years, with the emergence of SIMD-enhanced multicore devices with short vector instructions (up to 128 bits), there is a rapidly increasing interest in the topic [12][13][14][15]. Most of the latest reported innovations attempt to achieve optimal device-dependent performance by optimizing cache utilization or vectorizing operations carried out on a single data sequence.…”
Section: Vectorization Of the Fftmentioning
confidence: 99%
“…The Cell architecture is illustrated in Figure 1. Further details on the design parameters of the PXC 8i are well documented [14][15][16] and will not be repeated here. Note that both FORTRAN 95/2003 and C/C++ compilers for multi-core acceleration under Linux (i.e., XLF and XLC) are provided by IBM.…”
Section: Multicore Platformsmentioning
confidence: 99%
“…OpenCL [44] is an emerging open standard for parallel programming of heterogeneous systems; one of its targets is GPU computing and partitioning of computation across CPUs and GPUs.…”
Section: Mapping To Codementioning
confidence: 99%