The emergence of streaming multicore processors with multi-SIMD architectures and ultra-low power operation combined with real-time compute and I/O reconfigurability opens unprecedented opportunities for executing sophisticated signal processing algorithms faster and within a much lower energy budget. Here, we present an unconventional FFT implementation scheme for the IBM Cell, named transverse vectorization. It is shown to outperform (both in terms of timing or GFLOP throughput) the fastest FFT results reported to date in the open literature.