This paper presents an efficient 8-parallel 1024point multi-path delay commutator (MDC) fast Fourier transform (FFT) implementation on a field-programmable gate array (FPGA). The selection of the FFT algorithm and the data orders allow for obtaining an architecture with 23 non-trivial rotators, which is the minimum number achieved so far. Additionally, the non-general rotators in the architecture are trivial rotators, constant rotators, and 1-rots, which require very few resources to be implemented. The deep pipelining in the architecture allows for reaching a throughput of 5.2 GS/s.Index Terms-Fast Fourier transform (FFT), multi-path delay commutator (MDC), pipelined architecture.