Recovery based design (RBD) is a promising approach for the design of energy-efficient circuits under variations. RBD instruments circuits with mechanisms to identify and correct timing violations, thereby allowing reduced guard bands or design margins. In addition, RBD enables aggressive voltage overscaling to a point where timing errors occur even under nominal conditions. A major barrier to the widespread adoption of RBD is that traditional design practices and synthesis tools result in circuits with so-called "path walls", leading to an explosion in the number of timing errors beyond a certain critical operating voltage. To alleviate this effect, previous techniques focused on combinational circuit optimizations such as sizing, use of dual V th cells, re-structuring, etc. to favorably reshape the path delay distribution. However, these techniques are limited by the inherent sequential structure of the circuit, which defines the boundaries of the combinational logic.In this work, we explore a completely different approach to synthesize circuits for RBD. We propose the use of retiming, a well-known and powerful sequential optimization technique to redefine the boundaries of combinational logic, thereby creating new opportunities for RBD that cannot be explored by previous techniques. We make the key observation that, in retiming circuits with RBD (unlike classical retiming), it is acceptable for a few paths in the circuit to exceed the clock period. Using this insight, we propose a synthesis methodology, Relax-and-Retime, wherein the original circuit is relaxed by ignoring timing constraints on selected paths that are bottlenecks to retiming. When classical minimum period retiming is employed on this relaxed circuit, the path wall is shifted to a lower delay, thus allowing additional voltage overscaling. The Relax-and-Retime methodology judiciously selects bottleneck paths by trading off recovery overheads caused by timing errors due to these paths with the opportunities for retiming. We utilize the proposed methodology to synthesize a wide range of benchmarks including arithmetic circuits, ISCAS89 benchmarks and modules from the UltraSPARC T1 processor. Our results demonstrate 9-25% (average of 15.3%) improvement in overall energy compared to a well-optimized baseline with RBD.
Deep Learning Networks (DLNs) are bio-inspired large-scale neural networks that are widely used in emerging vision, analytics, and search applications. The high computation and storage requirements of DLNs have led to the exploration of various avenues for their efficient realization. Concurrently, the ability of emerging post-CMOS devices to efficiently mimic neurons and synapses has led to great interest in their use for neuromorphic computing.We describe spindle, a programmable processor for deep learning based on spintronic devices. spindle exploits the unique ability of spintronic devices to realize highly dense and energy-efficient neurons and memory, which form the fundamental building blocks of DLNs. spindle consists of a three-tier hierarchy of processing elements to capture the nested parallelism present in DLNs, and a two-level memory hierarchy to facilitate data reuse. It can be programmed to execute DLNs with widely varying topologies for different applications. spindle employs techniques to limit the overheads of spin-to-charge conversion, and utilizes output and weight quantization to enhance the efficiency of spin-neurons. We evaluate spindle using a device-to-architecture modeling framework and a set of widely used DLN applications (handwriting recognition, face detection, and object recognition).Our results indicate that spindle achieves 14.4X reduction in energy consumption and 20.4X reduction in EDP over the CMOS baseline under iso-area conditions.
General-purpose Graphics Processing Units (GPGPUs) are widely used for executing massively parallel workloads from various application domains. Feeding data to the hundreds to thousands of cores that current GPGPUs integrate places great demands on the memory hierarchy, fueling an ever-increasing demand for on-chip memory. In this work, we propose STAG, a high density, energy-efficient GPGPU cache hierarchy design using a new spintronic memory technology called Domain Wall Memory (DWM). DWMs inherently offer unprecedented benefits in density by storing multiple bits in the domains of a ferromagnetic nanowire, which logically resembles a bit-serial tape. However, this structure also leads to a unique challenge that the bits must be sequentially accessed by performing "shift" operations, resulting in variable and potentially higher access latencies. To address this challenge, STAG utilizes a number of architectural techniques : (i) a hybrid cache organization that employs different DWM bit-cells to realize the different memory arrays within the GPGPU cache hierarchy, (ii) a clustered, bit-interleaved organization, in which the bits in a cache block are spread across a cluster of DWM tapes, allowing parallel access, (iii) tape head management policies that predictively configure DWM arrays to reduce the expected number of shift operations for subsequent accesses, and (iv) a shift aware pro- motion buffer (SaPB), in which accesses to the DWM cache are predicted based on intra-warp locality, and locations that would incur a large shift penalty are promoted to a smaller buffer. Over a wide range of benchmarks from the Rodinia, IS- PASS and Parboil suites, STAG achieves significant benefits in performance (12.1% over SRAM and 5.8% over STT-MRAM) and energy (3.3X over SRAM and 2.6X over STT-MRAM)
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.