High-level synthesis (HLS) promises to increase designer productivity in the face of steadily increasing FPGA sizes, and broaden the market of use, allowing software designers to reap the benefits of hardware implementation. One roadblock to HLS adoption is the lack of a debugging infrastructure. To debug, designers can run their source code on a processor; however, this does not capture interactions with other system components. The alternative is to debug using the RTL, which is beyond the expertise of software designers, and impractical for hardware designers as the RTL may not resemble the original source code.This paper presents a new approach to debugging HLS produced circuits, which allows the user to debug in the context of the source code, while running the circuit in-situ. This is accomplished by automatically inserting debug instrumentation into the circuit, which allows a debugger application to start and stop the circuit, monitor variables and set breakpoints. The instrumentation contains trace buffers to record the control and data flow in real-time, allowing the debugger to retrieve this data and replay the execution.As a proof of concept we integrated our approach into the LegUp HLS tool, and have made it publicly available. We present methods of optimizing the trace buffer usage, and show that we can replay 1243 lines of source code per 100Kb of memory allocated to trace buffers. On average, the instrumentation circuitry requires an 11% logic area overhead. This work enables real-time debugging of HLS circuits using a software-like debug interface, removing a major roadblock of HLS adoption.
Abstract-As each generation of FPGAs grow in size, the run time of the associated CAD tools is rapidly increasing. Many past efforts have aimed at improving the CAD run time through parallelization of the placement algorithm. Wang and Lemieux presented an algorithm that is scalable, deterministic, timingdriven and achieves speedup over VPR [Wang and Lemieux FPGA'11]. This paper provides two significant alterations to Wang and Lemieux's algorithm, resulting in additional speedup and quality improvement.The first contribution is a new data decomposition scheme, called the half-box window technique, which achieves speedup by reducing the frequency of thread synchronization. The second contribution is the development of an improved annealing schedule, which further improves run time and slightly improves the quality of results.Together, these modifications achieve run time speedups of up to 70%. To put this in perspective, Wang and Lemieux required 25 threads to achieve best speedup, while this work requires only 16 threads. For a 10% degradation in quality, the new 16-thread algorithm achieves a 51x speedup over VPR, compared to a 35x speedup by the 25-thread original algorithm. Regarding quality, the best quality of results achieved by the new algorithm is a 5% degradation versus VPR, compared to a 8% degradation of the original Wang and Lemieux algorithm.
No abstract
Exploring architectures for large, modern FPGAs requires sophisticated software that can model and target hypothetical devices. Furthermore, research into new CAD algorithms often requires a complete and open source baseline CAD flow. This article describes recent advances in the open source Verilog-to-Routing (VTR) CAD flow that enable further research in these areas. VTR now supports designs with multiple clocks in both timing analysis and optimization. Hard adder/carry logic can be included in an architecture in various ways and significantly improves the performance of arithmetic circuits. The flow now models energy consumption, an increasingly important concern. The speed and quality of the packing algorithms have been significantly improved. VTR can now generate a netlist of the final post-routed circuit which enables detailed simulation of a design for a variety of purposes. We also release new FPGA architecture files and models that are much closer to modern commercial architectures, enabling more realistic experiments. Finally, we show that while this version of VTR supports new and complex features, it has a 1.5× compile time speed-up for simple architectures and a 6× speed-up for complex architectures compared to the previous release, with no degradation to timing or wire-length quality.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.