2009 IEEE International Conference on Acoustics, Speech and Signal Processing 2009
DOI: 10.1109/icassp.2009.4959642
|View full text |Cite
|
Sign up to set email alerts
|

Generating high performance pruned FFT implementations

Abstract: We derive a recursive general-radix pruned Cooley-Tukey fast Fourier transform (FFT) algorithm in Kronecker product notation. The algorithm is compatible with vectorization and parallelization required on state-of-the-art multicore CPUs. We include the pruned FFT algorithm into the program generation system Spiral, and automatically generate optimized implementations of the pruned FFT for the Intel Core2Duo multicore processor. Experimental results show that using the pruned FFT can indeed speed up the fastest… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1

Citation Types

0
12
0

Year Published

2011
2011
2018
2018

Publication Types

Select...
6
1

Relationship

1
6

Authors

Journals

citations
Cited by 11 publications
(12 citation statements)
references
References 13 publications
(14 reference statements)
0
12
0
Order By: Relevance
“…Some implementations even use pruned algorithms [14] to take advantage of the zero elements to save some operations. Batch execution is specially adequate for processing small signals on GPUs due to their massive parallel architecture, otherwise the GPU would not be able to properly exploit its computational resources and the cost of device memory transfers and kernel launch would outweigh any benefits of using the GPU.…”
Section: Resultsmentioning
confidence: 99%
“…Some implementations even use pruned algorithms [14] to take advantage of the zero elements to save some operations. Batch execution is specially adequate for processing small signals on GPUs due to their massive parallel architecture, otherwise the GPU would not be able to properly exploit its computational resources and the cost of device memory transfers and kernel launch would outweigh any benefits of using the GPU.…”
Section: Resultsmentioning
confidence: 99%
“…This leads to FFT computations where some of the inputs are equal to zero and not all of the outputs are needed. The pruned or truncated DFT [4,18] were developed to take advantage of this situation. In [18], van der Hoeven introduced a radix-2 algorithm for the truncated Fourier transform (TFT), and showed how to invert the TFT (ITFT).…”
Section: Introductionmentioning
confidence: 99%
“…However, this approach requires that the location of the k nonzero outputs are known. Pruned FFTs can be optimized and also efficiently implemented using SIMD instructions [6]. A different approach is taken by the FADFT-2 [7], [8], which is a probabilistic algorithm that requires O(k polylog(n)) many operations and does not require the location of the nonzero outputs.…”
Section: Introductionmentioning
confidence: 99%