Proceedings of the 39th International Symposium on Symbolic and Algebraic Computation 2014
DOI: 10.1145/2608628.2608661
|View full text |Cite
|
Sign up to set email alerts
|

High performance implementation of the TFT

Abstract: This paper reports on a high-performance implementation of the truncated Fourier transform (TFT). A general CooleyTukey like algorithm for the TFT is developed that allows the implementation to automatically adapt to the memory hierarchy. Then the algorithm introduces a small relaxation for larger transform sizes which trades off slightly higher arithmetic cost for improved data flow which allows full vectorization and parallelization. The implementation is automatically derived and tuned using the SPIRAL syst… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

0
5
0

Year Published

2015
2015
2022
2022

Publication Types

Select...
5
1

Relationship

1
5

Authors

Journals

citations
Cited by 6 publications
(5 citation statements)
references
References 22 publications
0
5
0
Order By: Relevance
“…For obtaining high performance implementations, the complexity of the ITFT has limited implementations to fixed decomposition strategies and scalar codes. This paper extends the results from [18] to obtain general-radix and parallel algorithms for the ITFT, providing an automatically tuned high performance implementation. Combined with the TFT library in [18], a high performance implementation of fast integer and polynomial arithmetic can be obtained.…”
Section: Introductionmentioning
confidence: 81%
See 2 more Smart Citations
“…For obtaining high performance implementations, the complexity of the ITFT has limited implementations to fixed decomposition strategies and scalar codes. This paper extends the results from [18] to obtain general-radix and parallel algorithms for the ITFT, providing an automatically tuned high performance implementation. Combined with the TFT library in [18], a high performance implementation of fast integer and polynomial arithmetic can be obtained.…”
Section: Introductionmentioning
confidence: 81%
“…SPL expressions such as I ⊗ A and A ⊗ I can be automatically derived and optimized for vectorization and parallelization as illustrated in [18].…”
Section: A Relaxed Parallel Itft Algorithmmentioning
confidence: 99%
See 1 more Smart Citation
“…In the next section we recall and slightly adapt classical results, and present our implementation framework. Our use of codelets for small and moderate sizes of DFT is customary in other high performance software, such as FFTW3 and SPIRAL [10,29].…”
Section: Related Work and Our Contributionsmentioning
confidence: 99%
“…In [8] Harvey presents a new serial FFT algorithm which combines van der Hoeven's truncated FFT with David Bailey's cachefriendly FFT [1]. Meng and Johnson [13] improve on Harvey's work by incorporating vector instructions and multithreading into the truncated FFT. The implementation is derived and tuned using the Spiral system for code generation.…”
Section: Related Workmentioning
confidence: 99%