2016
DOI: 10.1186/s13634-016-0336-0
|View full text |Cite
|
Sign up to set email alerts
|

Instruction scheduling heuristic for an efficient FFT in VLIW processors with balanced resource usage

Abstract: The fast Fourier transform (FFT) is perhaps today's most ubiquitous algorithm used with digital data; hence, it is still being studied extensively. Besides the benefit of reducing the arithmetic count in the FFT algorithm, memory references and scheme's projection on processor's architecture are critical for a fast and efficient implementation. One of the main bottlenecks is in the long latency memory accesses to butterflies' legs and in the redundant references to twiddle factors. In this paper, we describe a… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
5
0

Year Published

2016
2016
2023
2023

Publication Types

Select...
4
4

Relationship

0
8

Authors

Journals

citations
Cited by 9 publications
(5 citation statements)
references
References 33 publications
(36 reference statements)
0
5
0
Order By: Relevance
“…However, thanks to pipelining, the CPPR2B of our design is very close to 1 when performing NTT incessantly. When four NTTs are performed successively, CPPR2B of our design is 1.27, which is better than the 1.4 of the VLIW processor [21]. Our design has an advantage in applications which need to perform NTT successively, such as CRYSTALS-Dilithium.…”
Section: Cppr2b =mentioning
confidence: 91%
See 1 more Smart Citation
“…However, thanks to pipelining, the CPPR2B of our design is very close to 1 when performing NTT incessantly. When four NTTs are performed successively, CPPR2B of our design is 1.27, which is better than the 1.4 of the VLIW processor [21]. Our design has an advantage in applications which need to perform NTT successively, such as CRYSTALS-Dilithium.…”
Section: Cppr2b =mentioning
confidence: 91%
“…Bahtat proposed the efficient scheduling way for FFT in VLIW processors [21], and measured his design by the number of cycles per pseudo radix-2 butterfly(CPPR2B). In our 2stream design, Thus, the proposed design focuses on throughput instead of latency as compared to the previous designs.…”
Section: Comparison With Other Designsmentioning
confidence: 99%
“…The second line of research focused on task scheduling within the conventional FFT implementations to speed up computation over specialized hardware. For example, [14] improved the butterfly task scheduling in a very-long-instructionword (VLIW) digital signal processors (DSP) chip using a software pipelining technique called modulo scheduling. This scheduling algorithm exploits the instruction-level parallelism (ILP) feature in the VLIW DSP platform to schedule multiple loop iterations in an overlapping manner [15].…”
Section: B Related Workmentioning
confidence: 99%
“…Moreover, it provides a maximum performance of 128 GFLOPS for a single precision floating point calculation [4]. In addition, several research communities have developed high-performance computing systems using the C6678 DSP [3,[5][6][7][8][9].…”
Section: Introductionmentioning
confidence: 99%