In this letter, a unified hardware architecture that can be reconfigured to calculate 2, 3, 4, 5, or 7-point DFTs is presented. The architecture is based on the Winograd Fourier transform algorithm (WFTA) and the complexity is equal to a 7-point DFT in terms of adders/subtracters and multipliers plus only seven multiplexers introduced to enable reconfigurability. The processing element finds potential use in memory-based FFTs, where nonpower-of-two sizes are required such as in DMB-T.
Introduction:The discrete Fourier transform (DFT) is an important algorithm in the field of digital signal processing. It transforms a signal from the time domain into the frequency domain, providing information about the spectrum of the signal. The direct computation of an N -point DFT requires to calculate a number of operations proportional to N 2 . In order to reduce the number of arithmetic operations, many fast algorithms have been proposed, such as Cooley-Tukey [1], prime factor (PFA) [2] and Winograd Fourier transform (WFTA) [3] algorithms. Here, we refer to them collectively as fast Fourier transform (FFT) algorithms. These algorithms are based on decomposing an N -point DFT recursively into smaller DFTs, leading to a reduction of the computational complexity [4].Most FFT algorithms and architectures have focused on power-of-two size DFTs. However, recently the interest in non-power-of-two size DFTs has increased, mainly motivated by the 3780-point DFT in Chinese digital TV (DMB-T) [5,6] based on orthogonal frequency-division multiplexing (OFDM). In the receiving side of OFDM systems, an inverse DFT (IDFT) is usually required, which is easily computed using a DFT processor.Most FFT architectures are not well optimised for the computation of non-power-two-point FFTs, which make use of small point DFTs with varying sizes, as well as more complex data management. Some pipelined architectures for the 3780-point DFT in DMB-T have been proposed [5,6]. However, the streaming nature of a pipelined architecture leads to the fact that it can often process data at a much higher rate compared to the required 7.56 Mb/s. Hence, the amount of computational resources are often excessive. In [7], individual processing elements for 3 and 5-point DFTs was proposed and considered for a pipelined architecture. However, they were not based on the WFTA and have a slightly higher complexity.Memory-based FFTs are often more suitable for low data rate applications (where the clock frequency offered by the implementation technology is higher than the data rate), as they allow reusing the computational resources to a higher degree [8]. For a non-power-oftwo memory-based FFT, a number of challenges remain. One is how to carry out the more complex data management to interconnect the small DFTs. Another one is to develop a processing element that is suitable for computing small point DFTs of different sizes. This letter presents a unified architecture to compute the 2, 3, 4, 5, and 7-point DFTs by a single processing element. This architecture can be us...