Embedded memory remains a major bottleneck in current integrated circuit design in terms of silicon area, power dissipation, and performance; however, static random access memories (SRAMs) are almost exclusively supplied by a small number of vendors through memory generators, targeted at rather generic design specifications. As an alternative, standard cell memories (SCMs) can be defined, synthesized, and placed and routed as an integral part of a given digital system, providing complete design flexibility, good energy efficiency, low-voltage operation, and even area efficiency for small memory blocks. Yet implementing an SCM block with a standard digital flow often fails to exploit the distinct and regular structure of such an array, leaving room for optimization. In this article, we present a design methodology for optimizing the physical implementation of SCM macros as part of the standard design flow. This methodology introduces controlled placement, leading to a structured, noncongested layout with close to 100% placement utilization, resulting in a smaller silicon footprint, reduced wire length, and lower power consumption compared to SCMs without controlled placement. This methodology is demonstrated on SCM macros of various sizes and aspect ratios in a state-of-the-art 28nm fully depleted silicon-on-insulator technology, and compared with equivalent macros designed with the noncontrolled, standard flow, as well as with foundry-supplied SRAM macros. The controlled SCMs provide an average 25% reduction in area as compared to noncontrolled implementations while achieving a smaller size than SRAM macros of up to 1Kbyte. Power and performance comparisons of controlled SCM blocks of a commonly found 256 × 32 (1 Kbyte) memory with foundry-provided SRAMs show greater than 65% and 10% reduction in read and write power, respectively, while providing faster access than their SRAM counterparts, despite being of an aspect ratio that is typically unfavorable for SCMs. In addition, the SCM blocks function correctly with a supply voltage as low as 0.3V, well below the lower limit of even the SRAM macros optimized for low-voltage operation. The controlled placement methodology is applied within a full-chip physical implementation flow of an OpenRISC-based test chip, providing more than 50% power reduction compared to equivalently sized compiled SRAMs under a benchmark application.
An ultra-high throughput low-density parity check (LDPC) decoder with an unrolled full-parallel architecture is proposed, which achieves the highest decoding throughput compared to previously reported LDPC decoders in the literature. The decoder benefits from a serial message-transfer approach between the decoding stages to alleviate the well-known routing congestion problem in parallel LDPC decoders. Furthermore, a finite-alphabet message passing algorithm is employed to replace the variable node update rule of the standard min-sum decoder with look-up tables, which are designed in a way that maximizes the mutual information between decoding messages. The proposed algorithm results in an architecture with reduced bit-width messages, leading to a significantly higher decoding throughput and to a lower area as compared to a min-sum decoder when serial message-transfer is used. The architecture is placed and routed for the standard min-sum reference decoder and for the proposed finite-alphabet decoder using a custom pseudo-hierarchical backend design strategy to further alleviate routing congestions and to handle the large design. Post-layout results show that the finite-alphabet decoder with the serial message-transfer architecture achieves a throughput as large as 588 Gbps with an area of 16.2 mm 2 and dissipates an average power of 22.7 pJ per decoded bit in a 28 nm FD-SOI library. Compared to the reference min-sum decoder, this corresponds to 3.1 times smaller area and 2 times better energy efficiency.Index Terms-Low-density parity-check code, min-sum decoding, unrolled architecture, finite-alphabet decoder, 28 nm FD-SOI.
Ultra-low power applications often require several kb of embedded memory and are typically operated at the lowest possible operating voltage (VDD) to minimize both dynamic and static power consumption. Embedded memories can easily dominate the overall silicon area of these systems, and their leakage currents often dominate the total power consumption. Gain-cell based embedded DRAM arrays provide a high-density, low-leakage alternative to SRAM for such systems; however, they are typically designed for operation at nominal or only slightly scaled supply voltages. This paper presents a gain-cell array which, for the first time, targets aggressively scaled supply voltages, down into the subthreshold (sub-VT) domain. Minimum VDD design of gain-cell arrays is evaluated in light of technology scaling, considering both a mature 0.18 μm CMOS node, as well as a scaled 40 nm node. We first analyze the trade-offs that characterize the bitcell design in both nodes, arriving at a best-practice design methodology for both mature and scaled technologies. Following this analysis, we propose full gain-cell arrays for each of the nodes, operated at a minimum VDD. We find that an 0.18 μm gain-cell array can be robustly operated at a sub-VT supply voltage of 400mV, providing read/write availability over 99% of the time, despite refresh cycles. This is demonstrated on a 2 kb array, operated at 1 MHz, exhibiting full functionality under parametric variations. As opposed to sub-VT operation at the mature node, we find that the scaled 40 nm node requires a near-threshold 600mV supply to achieve at least 97% read/write availability due to higher leakage currents that limit the bitcell’s retention time. Monte Carlo simulations show that a 600mV 2 kb 40 nm gain-cell array is fully functional at frequencies higher than 50 MHz
Abstract-Dual-edge-triggered (DET) synchronous operation is a very attractive option for low-power, high-performance designs. Compared to conventional single-edge synchronous systems, DET operation is capable of providing the same throughput at half the clock frequency. This can lead to significant power savings on the clock network that is often one of the major contributors to total system power. However, in order to implement DET operation, special registers need to be introduced that sample data on both clock-edges. These registers are more complex than their single-edge counterparts, and often suffer from a certain amount of clock-overlap between the main clock and the internally generated inverted clock. This overlap can cause contention inside the cell and lead to logic failures, especially when operating at scaled power supplies and under process variations that characterize nanometer technologies. This paper presents a novel, static DET flip-flop (DET-FF) with a true-singlephase clock that completely avoids clock overlap hazards by eliminating the need for an inverted clock edge for functionality. The proposed DET FF was implemented in a standard 40 nm CMOS technology, showing full functionality at low-voltage operating points, where conventional DET-FFs fail. Under a nearthreshold, 500 mV supply voltage, the proposed cell also provides a 35% lower CK-to-Q delay and the lowest power-delay-product compared to all considered DET-FF implementations.
Abstract-Biomedical systems often require several kb of embedded memory and are typically operated in the subthreshold (sub-VT) domain for good energy-efficiency. Embedded memories and their leakage current can easily dominate the overall silicon area and the total power consumption, respectively. Gain-cell based embedded DRAM arrays provide a high-density, lowleakage alternative to SRAM for such systems; however, they are typically designed for operation at nominal or only slightly scaled supply voltages. For the first time, this paper presents a gain-cell array which is fully functional in the sub-VT regime and achieves a data retention time that is more than 10 4 times higher than the access time. Monte Carlos simulations show that the 2 kb gaincell array, implemented in a mature 0.18 µm CMOS node and supplied with a sub-VT voltage of 400 mV, exhibits robust write and read operations at 500 kHz under parametric variations and has over 99% availibilty for read and write access.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.