25.4 A 20nm 6GB Function-In-Memory DRAM, Based on HBM2 with a 1.2TFLOPS Programmable Computing Unit Using Bank-Level Parallelism, for Machine Learning Applications

Kwon, Young-Cheon; Lee, Suk Han; Lee, Jae‐Hoon; Kwon, Sanghyuk; Ryu, Je Min; Son, Jaeman; Seongil, O; Yu, Houqing; Lee, Haesuk; Kim, Soo Young; Cho, Youngmin; Kim, Jin Guk; Choi, Junho; Shin, Hyunsung; Kim, Jin; Phuah, Bengseng; Kim, HyoungMin; Song, Myeong Jun; Choi, Ahn; Kim, Daeho; Kim, Soo‐Young; Kim, Eun-Bong; Wang, David; Kang, Shin-haeng; Ro, Yuhwan; Seo, Seung-Woo; Song, Joonho; Youn, Jaeyoun; Sohn, Kiwon; Kim, Nam Sung

doi:10.1109/isscc42613.2021.9365862

Cited by 82 publications

(37 citation statements)

References 6 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…In-Memory Accelerators (IMA): Accelerators are placed within memory devices on the same silicon piece, either by placing logic between memory layers [47], or by taking advantage of the 3D-stacked integration technologies to accommodate NDP capabilities on the logic layer. Considering Single Data Rate (SDR) and Double Data Rate (DDR) memories, several techniques were proposed to process data inside these memories by integrating the processing logic into the DRAM row-buffers.…”

Section: Near Data Taxonomymentioning

confidence: 99%

“…Other proposals integrate fine and coarse grain reconfigurable logic inside a logic layer [78,88]. Finally, several proposals integrate custom Application-Specific Integrated Circuits (ASICs) able to accelerate only specific applications [25,26,27,30,31,47,64,65,71].…”

Section: Near Data Taxonomymentioning

confidence: 99%

“…Imani et al [46] propose the mechanism called FloatPIM, implementing floating-point representation and operations through bitwise NOR memristor operations on digital store, eliminating signal conversions and providing better performance and inference accuracy over the PipeLayer and ISAAC. Kwon et al [47] present the mechanism called FIMDRAM, which proposes integrating an engine capable of large vector operations to the DRAM, thus exploiting bank-level parallelism and achieving performance up to 4× superior processing bandwidth in comparison to an off-chip device. Lee et al [48] present the architecture Similarity Search Associative Memory (SSAM), an accelerator for similarity search applied to K-Nearest Neighbors (KNN) search, which outperforms Graphics Processing Unit (GPU)-and Field-Programmable Gate Array (FPGA)-based alternatives in terms of throughput and energy efficiency.…”

Section: Introductionmentioning

confidence: 99%

See 2 more Smart Citations

Survey on Near-Data Processing: Applications and Architectures

Santos

Moreira

Cordeiro

et al. 2021

JICS

View full text Add to dashboard Cite

One of the main challenges for modern processors is the data transfer between processor and memory. Such data movement implies high latency and high energy consumption. In this context, Near-Data Processing (NDP) proposals have started to gain acceptance as an accelerator device. Such proposals alleviate the memory bottleneck by moving instructions to data whereabouts. The first proposals date back to the 1990s, but it was only in the 2010s that we could observe an increase in papers addressing NDP. It occurred together with the appearance of 3D-stacked chips with logic and memory stacked layers. This survey presents a brief history of these accelerators, focusing on the applications domains migrated to near-data and the proposed architectures. We also introduce a new taxonomy to classify such architectural proposals according to their data distance.

show abstract

Section: Near Data Taxonomymentioning

confidence: 99%

Section: Near Data Taxonomymentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

Survey on Near-Data Processing: Applications and Architectures

Santos

Moreira

Cordeiro

et al. 2021

JICS

View full text Add to dashboard Cite

show abstract

“…By separating logic die and memory dies, 3D-stacked memory PIM can provide higher computing resources than other PIM mod- els. In particular, a function-in-memory (FIM) DRAM implementation has been recently proposed for running machine learning applications using HBM memory systems [12]. With the advent of FIM DRAM, the machine learning development environment for PIM systems has become more critical.…”

Section: A Processing-in-memory Techniquementioning

confidence: 99%

“…Recently, Kwon et al proposed the real product deployment of an HBM-based PIM device, named function-inmemory (FIM) DRAM [12]. Compared to PIMCaffe, the processing elements (programmable computing unit, PCU) of FIMDRAM are located inside the DRAM bank, using Samsung's HBM2 fabrication technology.…”

Section: Related Workmentioning

confidence: 99%

PIMCaffe: Functional Evaluation of a Machine Learning Framework for In-Memory Neural Processing Unit

et al. 2021

View full text Add to dashboard Cite

A large amount of memory usage in recent machine learning applications imposes a high degree of system burden in terms of power and processing speed. To cope with such a problem, Processing-In-Memory (PIM) techniques can be applied to and be an alternative solution. Especially, the recommendation system which is one of the major machine learning applications in data centers requires a huge memory capacity and can be a good candidate application helped by the PIM technique. In this paper, we introduce a machine learning framework designed for in-memory neural processing units and its evaluation environment, named PIMCaffe. PIMCaffe consists of two components; a Caffe2-based deep learning framework that supports PIM acceleration and a PIM-emulating hardware platform. We develop a suite of functions, libraries, application programming interfaces, and a device driver to support the framework. In addition, we implement a prototype Neural Processing Unit (NPU) in PIMCaffe to evaluate the performance of our platform with machine learning applications. Our prototype NPU design includes a vector processor for parallel vector processing and a systolic array unit for matrix multiplication. Using the proposed software framework, we perform a detailed analysis on the in-memory neural processing unit. PIMCaffe supports evaluations of recommendation systems and various convolutional neural network models on the in-memory neural processing unit. PIMCaffe with the NPU shows up to 2.26x, 5.99x, and 1.71x speedup for the recommendation system, AlexNet, and ResNet-50 respectively compared to the ARM Cortex-A53 CPU.

show abstract

Energy and Space Efficient Parallel Adder Using Molecular Memristors

Rath

et al. 2022

Advanced Materials

View full text Add to dashboard Cite

A breakthrough in in‐memory computing technologies hinges on the development of appropriate material platforms that can overcome their existing limitations, such as larger than optimal footprint and multiple serial computational steps, with potential accumulation of errors. Using a molecular switching element with multiple non‐monotonic and deterministic transitions, the device count and the number of computational steps can be substantially reduced. With molecular materials, however, the realization of a reliable and robust platform is an unattained goal for decades. Here, crossbar arrays with up to 64 molecular memristors are fabricated to experimentally demonstrate 8‐bit serial and 4‐bit parallel adders that operate for thousands of measurement cycles with an estimated error probability of 10−16. For performance benchmarking, a 32‐bit parallel adder is designed and simulated with 268 million inputs including contributions from the peripheral circuitry showing a 47× higher energy efficiency, 93× faster operation, and 9% of the footprint, leading to 4390 times improved energy–delay product compared to a special purpose complementary metal–oxide–semiconductor (CMOS)‐based multicore adder.

show abstract

25.4 A 20nm 6GB Function-In-Memory DRAM, Based on HBM2 with a 1.2TFLOPS Programmable Computing Unit Using Bank-Level Parallelism, for Machine Learning Applications

Cited by 82 publications

References 6 publications

Survey on Near-Data Processing: Applications and Architectures

Survey on Near-Data Processing: Applications and Architectures

PIMCaffe: Functional Evaluation of a Machine Learning Framework for In-Memory Neural Processing Unit

Energy and Space Efficient Parallel Adder Using Molecular Memristors

Contact Info

Product

Resources

About