Chirag Ravishankar scite author profile

Guarded evaluation is a power reduction technique that involves identifying sub-circuits (within a larger circuit) whose inputs can be held constant (guarded) at specific times during circuit operation, thereby reducing switching activity and lowering dynamic power. The concept is rooted in the property that under certain conditions, some signals within digital designs are not "observable" at design outputs, making the circuitry that generates such signals a candidate for guarding. Guarded evaluation has been demonstrated successfully for custom ASICs; in this paper, we apply the technique to FPGAs. In ASICs, guarded evaluation entails adding additional hardware to the design, increasing silicon area and cost. Here, we apply the technique in a way that imposes minimal area overhead by leveraging existing unused circuitry within the FPGA. The primary challenge in guarded evaluation is in determining the specific conditions under which a sub-circuit's inputs can be held constant without impacting the larger circuit's functional correctness. We propose a simple solution to this problem based on discovering "non-inverting paths" in the circuit's ANDinverter graph representation. Experimental results show that guarded evaluation can reduce switching activity by 22%, on average, and can reduce power consumption in the FPGA interconnect by 14%.

show abstract

Analysis and evaluation of greedy thread swapping based dynamic power management for MPSoC platforms

Ravishankar

Ananthanarayanan

Garg

et al. 2012

View full text Add to dashboard Cite

Abstract-Thread migration (TM) is a recently proposed dynamic power management technique for heterogeneous multi-processor system-on-chip (MPSoC) platforms that eliminates the area and power overheads incurred by fine-grained dynamic voltage and frequency scaling (DVFS) based power management. In this paper, we take the first step towards formally analyzing and experimentally evaluating the use of power-aware TM for parallel data streaming applications on MPSoC platforms. From an analysis perspective, we characterize the optimal mapping of threads to cores and prove the convergence properties of a complexity effective greedy thread swapping based TM algorithm to the globally optimal solution. The proposed techniques are evaluated on a 9-core FPGA based MPSoC prototype equipped with fully-functional TM and DVFS support, and running a parallelized video encoding benchmark based on the Motion Picture Experts Group (MPEG-2) standard. Our experimental results validate the proposed theoretical analysis, and show that the proposed TM algorithm provides within 8% of the DVFS performance under the same power budget, and assuming no overheads for DVFS. Assuming voltage regulator inefficiency of 80%, the proposed TM algorithm has 9% higher performance than DVFS, again under the same total power budget.

show abstract

Raising FPGA Logic Density Through Synthesis-Inspired Architecture

Anderson

Wang

Ravishankar

2012

IEEE Trans. VLSI Syst.

View full text Add to dashboard Cite

Abstract-We leverage properties of the logic synthesis netlist to define both a new FPGA logic element (function generator) architecture and an associated technology mapping algorithm that together provide improved logic density. We demonstrate that an "extended" logic element with slightly modified K-input LUTs achieves much of the benefit of an architecture with K+1-input LUTs, while consuming silicon area close to a K-LUT (a K-LUT requires half the area of a K+1-LUT). We introduce the notion of "non-inverting paths" in a circuit's ANDinverter graph (AIG) and show their utility in mapping into the proposed logic element architectures. We propose a general family of logic element architectures, and present results showing that they offer a variety of area/performance trade-offs. One of our key results demonstrates that while circuits mapped to a traditional 5-LUT architecture need 15% more LUTs and have 14% more depth than a 6-LUT architecture, our extended 5-LUT architecture requires only 7% more LUTs and 5% more depth than 6-LUTs, on average. Nearly all of the depth reduction associated with moving from K-input to K+1-input LUTs can be achieved with considerably less area using extended K-LUTs. We further show that 6-LUT optimal mapping depths can be achieved with a small fraction of the LUTs in hardware being 6-LUTs and the remainder being extended 5-LUTs, suggesting that a heterogeneous logic block architecture may prove to be advantageous.

show abstract

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.