2015
DOI: 10.1145/2775054.2694358
|View full text |Cite
|
Sign up to set email alerts
|

PuDianNao

Abstract: Machine Learning (ML) techniques are pervasive tools in various emerging commercial applications, but have to be accommodated by powerful computer systems to process very large data. Although general-purpose CPUs and GPUs have provided straightforward solutions, their energy-efficiencies are limited due to their excessive supports for flexibility. Hardware accelerators may achieve better energy-efficiencies, but each accelerator often accommodates only a single ML technique (family). According to the famous No… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1

Citation Types

0
3
0

Year Published

2017
2017
2024
2024

Publication Types

Select...
4
3
1

Relationship

0
8

Authors

Journals

citations
Cited by 29 publications
(3 citation statements)
references
References 31 publications
0
3
0
Order By: Relevance
“…Many dense architectures have been proposed in the literature that optimize compute [14,25,30] and memory bandwidth [13,32] for CNN inferences. Quantization of weights and activations using log [35,58] and linear [17,30] techniques further reduce the memory footprint.…”
Section: Related Workmentioning
confidence: 99%
“…Many dense architectures have been proposed in the literature that optimize compute [14,25,30] and memory bandwidth [13,32] for CNN inferences. Quantization of weights and activations using log [35,58] and linear [17,30] techniques further reduce the memory footprint.…”
Section: Related Workmentioning
confidence: 99%
“…ASIC Cloud-worthy accelerators with planet-scale applicability are numerous, including those targeting graph processing [28], database servers [29], Web Search RankBoost [30], Machine Learning [31,32], gzip/gunzip [33] and Big Data Analytics [34]. Tandon et al [35] designed accelerators for similarity measurement in natural language processing.…”
Section: Related Workmentioning
confidence: 99%
“…To optimize memoryaccess and data movement, DianNao [8] uses customized on-chip buffer to minimize energy-hungry DRAM accesses. In contrast, the next generation of accelerators of the DianNao family-DaDianNao [8], ShiDianNao [9] and PuDianNao [49] use onchip embedded-DRAM and SRAM (Static Random Access Memory) completely to eradicate DRAM access. As the performance of deep learning increased with huge data, prevalent hardware architectures [8,9,45] are limited by inefficient datatransfer between processing elements and main memory.…”
Section: Existing Dnn Acceleratorsmentioning
confidence: 99%