In this paper, we propose a systemic approach for synthesizing field-programmable gate array (FPGA) implementations of fast Fourier transform (FFT) computations. Our approach considers both cost (in terms of FPGA resource requirements), and performance (in terms of throughput), and optimizes for both of these dimensions based on user-specified requirements. Our approach involves two orthogonal techniques -FFT inner loop unrolling and outer loop unrolling -to perform design space exploration in terms of cost and performance. By appropriately combining these two forms unrolling, we can achieve cost-optimized FFT implementations in terms of FPGA slices or block RAMs in FPGA, subject to the required throughput. We compared the results of our synthesis approach with a recently-introduced commercial FPGA intellectual property (IP) core -the FFT IP module in the Xilinx LogiCore Library, which provides different FFT implementations that are optimized for a limited set of performance levels. Our results demonstrate efficiency levels that are in some cases better than these commercial IP blocks. At the same time, our approach provides the advantages of being able to optimize implementations based on arbitrary, user-specified performance levels, and of being based on general formulations of FFT loop unrolling trade-offs, which can be retargeted to different kinds of FPGA devices.
Abstract. Single-assignment languages with copy semantics have a very simple and approachable programming model. A naïve implementation of the copy semantics that copies the result of every computation to a new location, can result in poor performance. Whereas, an implementation that keeps the results in the same location, when possible, can achieve much higher performance.In this paper, we present a greedy algorithm for in-place computation of aggregate (array and structure) variables. Our algorithm greedily picks the most profitable opportunities for in-place computation, then updates the scheduling and in-place constraints in the program graph. The algorithm runs in O(T logT + EW V + V 2 ) time, where T is the number of in-placeness opportunities, EW is the number of edges and V the number of computational nodes in a program graph.We evaluate the performance of the code generated by the LabVIEW TM compiler using our algorithm against the code that performs no in-place computation at all, resulting in significant application performance improvements. We also compare the performance of the code generated by our algorithm against the commercial LabVIEW compiler that uses an adhoc in-placeness strategy. The results show that our algorithm matches the performance of the current LabVIEW strategy in most cases, while in some cases outperforming it significantly.
Abstract-The 2-dimensional (2D) Fast Fourier Transform (FFT) is a fundamental, computationally intensive function that is of broad relevance to distributed smart camera systems. In this paper, we develop a systematic method for improving the throughput of 2D-FFT implementations on field-programmable gate arrays (FPGAs). Our method is based on a novel loop unrolling technique for FFT implementation, which is extended from our recent work on FPGA architectures for 1D-FFT implementation [1]. This unrolling technique deploys multiple processing units within a single 1D-FFT core to achieve efficient configurations of data parallelism while minimizing memory space requirements, and FPGA slice consumption. Furthermore, using our techniques for parallel processing within individual 1D-FFT cores, the number of input/output (I/O) ports within a given 1D-FFT core is limited to one input port and one output port. In contrast, previous 2D-FFT design approaches require multiple I/O pairs with multiple FFT cores. This streamlining of 1D-FFT interfaces makes it possible to avoid complex interconnection networks and associated scheduling logic for connecting multiple I/O ports from 1D-FFT cores to the I/O channel of external memory devices. Hence, our proposed unrolling technique maximizes the ratio of the achieved throughput to the consumed FPGA resources under pre-defined constraints on I/O channel bandwidth. To provide generality, our framework for 2D-FFT implementation can be efficiently parameterized in terms of key design parameters such as the transform size and I/O data word length.
The LabVIEWT M 1 system is based on a graphical dataflow language, and is widely used for data acquisition, instrument control and industrial automation. This paper presents a methodology for annotating LabVIEW programs with their specifications, translating those annotated programs into ACL2, and proving the translated specifications with ACL2. Our system supports verification of inductive invariants of bounded loops as well as assertions about straight-line code. Our verification methodology supports the user by generating a highly structured set of proof obligations, many or all of which are discharged automatically. This methodology makes extensive use of hints to support scalability, including careful theory control as well as functional instantiation that avoids explicit use of induction. We describe the design, applicability and limitations of the framework. We also present several examples demonstrating our approach.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.