A 128× × ×32 bit ultra-low power (ULP) memory with one read and one write port is presented. A full-custom standardcell compliant dual-bit latch with two integrated NAND-gates was designed. The NAND-gate realizes the first stage of a read multiplexer. A dense layout reduces the physical cell area by 56 %, compared to a pure commercial standard-cell equivalent. Effectively, an overall memory area reduction of 32 % is achieved. The gates are integrated into a digital standard-cell based memory (SCM) flow. Silicon measurements show correct read and write operation deep in the subthreshold domain (sub-VT VT VT), down to 370 mV, and data is retained down to 320 mV. At the energy minimum voltage (450 mV) the memory dissipates 35 fJ/operation.
I. INTRODUCTIONIn recent years, the interest for ultra-low power (ULP) systems has increased rapidly, often achieved by aggressive supply voltage (V DD ) scaling deep down into the subthreshold (sub-V T ) region [1]. To accommodate ULP systems with memories, several approaches have emerged. A straightforward approach is to use commercial SRAMs with bitcells constructed from 6 transistors (6T) operated in a separate power domain with a higher V DD , and use level shifters to communicate with the ULP domains. Another popular approach is to use fullcustom SRAMs with larger bitcells, 8-14 transistors (8T-14T), together with read-and write-assist techniques to operate in the weak inversion region. Correct operation deep down in the sub-V T region using this approach has been demonstrated by various authors, [2]-[6], and even at smaller technology nodes, [7], [8]. A third approach uses standard-cell based memories (SCMs) as previously demonstrated in [9], [10].The selection of the most suitable technique involves tradeoffs between area, energy, and operating frequency. In an ULPsystem, energy is often the most important constraint. A 6T-bitcell SRAM solution is in many cases most area efficient and has the highest memory bandwidth, but suffers from higher energy dissipation as operation is performed at a higher V DD . Furthermore, extra level shifters are required, which increase energy dissipation as well as system complexity. A 8-14T weak-inversion SRAM looses in area and bandwidth compared to a 6T SRAM, but is more energy efficient, and thus, a suitable candidate. In terms of engineering effort, a design with a complex floorplan which has many small and distributed memory blocks might benefit from an SCM approach. An SCM uses standard-cells to construct a memory that is synthesized together with the RTL code of the whole system, and is placed as "glue logic" close to, or even with the processing block. This gives flexibility to the ASIC designer,