2016
DOI: 10.1145/2890498
|View full text |Cite
|
Sign up to set email alerts
|

Power, Area, and Performance Optimization of Standard Cell Memory Arrays Through Controlled Placement

Abstract: Embedded memory remains a major bottleneck in current integrated circuit design in terms of silicon area, power dissipation, and performance; however, static random access memories (SRAMs) are almost exclusively supplied by a small number of vendors through memory generators, targeted at rather generic design specifications. As an alternative, standard cell memories (SCMs) can be defined, synthesized, and placed and routed as an integral part of a given digital system, providing complete design flexibility, go… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
2

Citation Types

0
48
0

Year Published

2017
2017
2022
2022

Publication Types

Select...
6
3
1

Relationship

5
5

Authors

Journals

citations
Cited by 50 publications
(49 citation statements)
references
References 18 publications
(46 reference statements)
0
48
0
Order By: Relevance
“…GCC 4.9 and LLVM 3.7 toolchains are available for the cores, while OpenMP 3.0 is supported on top of the bare-metal parallel runtime. The cores share a single instruction cache of 4 kB of Standard Cell Memory (SCM) [55] that can increase energy efficiency by up to 30% compared to an SRAM-based private instruction cache on parallel workloads [56]. The ISA extensions of the core include general-purpose enhancements (automatically inferred by the compiler), such as zero-overhead hardware loops and load and store operations embedding pointer arithmetic, and other DSP extensions that can be explicitly included by means of intrinsic calls.…”
Section: Soc Architecturementioning
confidence: 99%
“…GCC 4.9 and LLVM 3.7 toolchains are available for the cores, while OpenMP 3.0 is supported on top of the bare-metal parallel runtime. The cores share a single instruction cache of 4 kB of Standard Cell Memory (SCM) [55] that can increase energy efficiency by up to 30% compared to an SRAM-based private instruction cache on parallel workloads [56]. The ISA extensions of the core include general-purpose enhancements (automatically inferred by the compiler), such as zero-overhead hardware loops and load and store operations embedding pointer arithmetic, and other DSP extensions that can be explicitly included by means of intrinsic calls.…”
Section: Soc Architecturementioning
confidence: 99%
“…Instruction caches can also be implemented with SCMs. The usage of SCMs for the implementation of frequently-accessed memory banks significantly improves energy efficiency, since energy/access of SCM is significantly lower than that of SRAMs for the relatively small cuts needed in L1 instruction and data memories [32]. Depending on the availability of low-voltage memories in the targeted implementation technology, different ratios of SCM and SRAM memory can be instantiated at design time.…”
Section: Cluster Architecturementioning
confidence: 99%
“…We demonstrate that this approach improves the energy efficiency of the digital core of the accelerator by 5.1×, and the throughput by 1.3×, with respect to a baseline architecture based on 12-bit MAC units operating at a nominal supply voltage of 1.2 V. To extend the performance scalability of the device, we implement a latch-based standard cell memory (SCM) architecture for on-chip data storage. Although SCMs are more expensive than SRAMs in terms of area, they provide better voltage scalability and energy efficiency [26], extending the operating range of the device in the low-voltage region. This further improves the energy efficiency of the engine by 6× at 0.6 V, with respect to the nominal operating voltage of 1.2 V, and leads to an improvement in energy efficiency by 11.6× with respect to a fixed-point implementation with SRAMs at its best energy point of 0.8 V. To improve the flexibility of the convolutional engine we implement support for several kernel sizes (1×1 -7×7), and support for per-channel scaling and biasing, making it suitable for implementing a large variety of CNNs.…”
Section: Introductionmentioning
confidence: 99%